Stuff

I picked up an ATI Theater HD 750 USB capture device recently to add to my arsenal. As it is, I have far too many capture devices as it is, but about half of my collection is PCI, and now that I'm laptop-based that's rather inconvenient. I have two other USB capture devices, a Plextor and a GameBridge. The problem with the Plextor is that it produces a soft image and produces compressed video, so it has significant latency, about 300ms. It's also a bit big. The GameBridge is much smaller and produces straight UYVY -- perfectly fine for what I do -- but as it turns out, it will not sync to an Atari 800XL, whose video signal is too far out of spec. I was pleased to discover that the ATI device also outputs uncompressed data and does sync to the Atari, although the color decoding is kind of crappy... but that's a story for another time.

Unfortunately, I also discovered to my annoyance that the 750 USB has a problem with VirtualDub: it stutters horribly when "enable audio playback" is set. That option allows you to hear the output of capture devices that aren't sound cards in Windows, but the unofficial story is that it's the "emulate a TV set" option. I always hate debugging these issues because they always turn out to be issues where something isn't getting along with the DirectShow capture graph, which means doing all sorts of arcane incantations to try to fix it. After playing around a bit with Microsoft GraphEdit and with the cap_dshow.cpp module, I figured out that it was a problem with the filter graph clock. The filter graph clock is an entity, either in the filter graph or external, whose job is to supply a uniform timebase for all data that moves through the filter graph. This then allows the renderers that are displaying the video and playing the audio to all work off of the same timestamps and to also schedule their output to the same clock. For instance, video players typically have the DirectSound Renderer providing the clock, with the video renderer syncing to audio playback. This is also the configuration that VirtualDub normally chooses when the audio playback option is on. It turns out that the 750 doesn't like that configuration and has oscillation problems with it. The usual way to attack these problems is with the "disable timestamps for preview" option in Capture > Timing in recent versions of VirtualDub; this normally solves the problem by killing the timestamps and causing everything to play immediately as it arrives. Well, that's even worse, as the ATI capture filter doesn't even want to start in that case.

What does work, though, is using the system clock (CLSID_SystemClock). The bad part about this is why VirtualDub has its current behavior. I dug up the P4 revision history for that module, and it turns out the reason I wrote it this way is that I couldn't get the capture driver for the SAA713x to work with audio playback without using the audio renderer's clock. In particular, the system clock specifically did not work with that card. This means that the only way I can make this work across the board is to make the filter graph clock option three-way so the working clock can be manually chosen. Yuck. I hate doing this since it's just throwing the problem to the user. Heck, I wrote the program and I didn't know the answer either -- I just tried all of the options in code until I found one that worked. Unfortunately, this seems par for the course with DirectShow, which I've found to be one of the flakiest parts of Windows. I've described my disdain for this API before, and this is further reinforcement. It's bad enough that the API is complex, but too many of the third-party filters have antisocial behavior or hideous bugs, especially capture drivers.

Anyway, expect a fix for this issue to be available in a future version of VirtualDub. I'm thinking it'll probably be in the next experimental version, as I'm pretty risk averse at the moment to pushing things into the stable branch, and changes to DirectShow graph building code rank pretty high on my risk chart. If you have a device like this and want to try a hacked up test release, though, let me know -- I could always use some testing.

Version 1.6 of my emulator Altirra is out. This version includes various fixes for hardware emulation accuracy issues, improvements to the built-in placeholder kernel, and enhancements to the debugger and UI.

The "VDXA" in the "Enable 3D video filter acceleration (VDXA)" option in VirtualDub's Preferences dialog is not a typo. It stands for VirtualDub eXternal Acceleration and is an additional API I added in 1.9.3 to allow video filters to use Direct3D 9. However, for some reason everyone keeps confusing it with DirectX Video Acceleration (DXVA), which is an entirely different API for video decoding acceleration in Windows, which is completely separate and which VirtualDub doesn't use.

I just got burnt by an old bug -- err, feature -- in the Visual C++ debugger.

In the VC6 days, there was a nasty "feature" where the debugger evaluated unprefixed integers in expressions used the current number base setting. This meant that array[10] in the Watch window showed element 10 if you were in decimal mode and element 16 if you were in hexadecimal mode. You had to use the nonstandard "0n" prefix to force decimal, i.e. array[0n10]. Thankfully, this has been fixed and in the last few versions unprefixed numbers work as expected.

However, it turns out that this problem still occurs in autoexp.dat visualizers. My deque visualizer looks like this:

vdfastdeque, like most STL deque implementations, stores elements in a sequence of fixed-size blocks with some portions of the first and last blocks possibly unused. Well, it took me a while to figure out that the elements I was seeing were bogus, but only when hexadecimal mode was enabled. The problem is the "& 31" in the array expr: field, which was turning into "& 0x31" in hex mode... argh!!

As expected, changing the visualizer to use 0n31 or 0x1f fixes the problem. I did some experimentation, and this only seems to affect cases where expressions are evaluated for display. In particular, it doesn't affect the size: field of #array, and probably not any conditions or values used by #if or #switch, which is why none of Microsoft's visualizers show this problem. The problem also reproduces in VC2010 Express, so it hasn't been fixed yet. That's not too surprising, since very little has changed in the visualizer engine except a couple of small bug fixes.

Anyway, today's conclusion: make sure you test your visualizers in hex mode, and prefix constants to avoid such problems.

This routine uses MMX intrinsics to compute the variance of a series of samples, stored as unsigned bytes. SSE intrinsics got some attention in the VS2010 compiler, but MMX intrinsics have long been the neglected stepchild and I hadn't heard anything about them. Well, let's look at the disassembly:

After seeing an interesting discussion about image scaling quality in Windows Presentation Foundation (WPF) I decided to take a look at its "High Quality" setting. It's documented in the BitmapScalingMode enum as being a synonym for "Fant" scaling:

Unfortunately, being sourced from academic literature, trying to find anything direct on Fant's algorithm results in a maze of links to paywalls and abstracts. After digging around a bit, though, I found enough detailed references and code fragments to figured it out, and also spent some time trying it out in a WPF application. (By the way, the WPF Designer is soooooo slooooooowwwwwww!) Now, WPF essentially supports three bitmap scaling modes: nearest, linear, and Fant. Nearest and linear are, as you'd expect, just point sampling and linear interpolation. Linear (bilinear) mode is restricted to a 2x2 kernel and does not use mipmaps, so it will start aliasing once you get below 70% or so. The Fant algorithm, however, is described as very high quality. Sounds interesting, right?

Well, not really.

It turns out that Fant's algorithm is... a box filter. More specifically, it is equivalent to linear interpolation for enlargement along an axis and a box filter for decimation. From what I gather, Fant's algorithm was originally interesting because of the way that the box filter was implemented, which was amenable to hardware implementation. By modern standards and implemented in software, though, it's unremarkable. I tried it out with some decimation settings on some test images and particularly a zone plate image, and it showed more aliasing than a conventional triangle filter (precise bilinear in VirtualDub) or bicubic decimation filter -- not too surprising for a box filter.

I don't know WPF very well, but from what I can gather, Fant's algorithm was made available because that's what the Windows Imaging Component (WIC) has available under the hood. Thing is, besides the mediocre quality, a box filter doesn't make sense when you are trying to support accelerated rendering since it's more expensive to implement than generating mipmaps and using trilinear filtering, which is already implemented directly in the texture sampling hardware. Box filtering isn't, which means it has to be done manually, which is slower... assuming that WPF even bothers accelerating it. It's also a somewhat awkward limitation since .NET has already had decent bilinear and bicubic decimation support available through System.Drawing (GDI+). Therefore, if you sometimes need to downscale images in your WPF based UI, you might consider prescaling them through GDI+ instead to get better quality.

It's been so long since I released a new version of VirtualDub that I almost forgot my SourceForge password. Ugh.

Version 1.9.9 is now up, and is a stable release with queued up bug fixes; several of them are in the core filtering engine and a couple are in the AVI and Huffyuv handling code. As usual for a stable release, there are no new features other than a small diagnostic one (log specific errors when plugins fail to load on startup).

I don't normally disclose much specifics about what I have in progress, but since I've been slow lately working on VirtualDub (mainly due to time and motivational constraints) and because I have published some working test code already, I'll mention some of my long term plans here. As many of you know, one of the big issues with VirtualDub right now is lack of support for more advanced video formats. VirtualDub's development model depends on an abstraction between specific video formats and the core program for various reasons, relating to legal, maintenance, and simple complexity problems. The short answer is that I cannot and will not take a dependency on an external library to handle formats, nor can I implement them directly. Therefore, the current long-term plan is to beef up the input and output plugin support -- which includes adding command-line encoder support and bringing the DirectShow input plugin internal to the program. The command-line encoder support is still rough, but I've been posting test releases in the forum, and it's starting to look good as a way to export to formats such as H.264, as well as any bizarro homegrown format you can encode to with a simple CLI program. The DirectShow input plugin mostly works, although it needs work on the file type mapping side.

Another frequently requested feature is multicore support. There is a strong misconception that not taking advantage of multiple cores is a bug that requires a ten minute fix, and this notion really needs to be dispelled: writing correctly multithreaded software is hard with standard programming paradigms and that's a major problem that the software industry is currently facing. This is further complicated when dealing with APIs that have threading affinity or are difficult to run in a concurrent manner. The problem with VirtualDub is that the video and audio codecs are typically the main bottleneck, and I can't force those to be multithreaded from my end. I do have a couple of plans, though. First, video codecs that produce only key frames can be parallelized; this is not currently implemented in even test releases, but I don't see any blocking issue there. Second, I do have more control over video filters, and while I can't run a single video filter on multiple threads, as I've written before it is possible to run different video filters in the same chain concurrently. The test releases in the beta forum can already do this to some extent, and I'm seeing some promising gains on a dual-core machine.

Now, the bad part: I have no timeline whatsoever for this. Sorry, but that's reality. I have a full-time job, I have other interests, and therefore I only work on VirtualDub occasionally. I get a lot of requests for various features, some of which are well-written, and some of which are frankly abusive, but regardless I can only take on a few of them at a time, and I generally have to avoid chasing the frontier, whether it be in compilers, OS features, algorithms, formats, etc. If you've sent me an email and haven't gotten a response, or are still waiting for a feature to arrive -- I apologize, and please believe me when I say I keep a master list of requests. Thanks for your support and your continued patience, and if you've got some time, feel free to try some of the test releases I put out in the forums.