Stuff

A common belief is that you should always profile your code first to diagnose a performance problem before trying to optimize. This is generally a good idea, although sometimes it's taken to absurd extremes. You don't really need a profiler to see that an O(N^2) algorithm with N=40000 is likely a problem -- all you need is to break into the app with a debugger a few times. That having been said, the quicker your profiler is to use, the easier it is to just use the profiler and get some hard data. There's nothing like a profile showing a particular function at 98% of the CPU to identify a culprit.

It seems that most people I know prefer call graph profiling for the detail of data produced, but I like sampling profilers myself, because they're less intrusive and the data is more reliable, although less precise. They also often have the nice advantage that you can simply start them at any time and nonintrusively profile the whole system on the spot, without having to launch the application under it or terminate the app when profiling completes. Therefore, when the program I was working on was unexpectedly running one-third of the speed that it should have, I just launched AMD CodeAnalyst and fired off a standard 20-second no-launch sampling run (which is in a profiling project named "whatever").

The result?

Well, the profile showed a bunch of function names that started with @ILT... which, if you're familiar with Visual C++, stands for "incremental link table." Which means that the reason that the program was running slowly was that I was running the unoptimized debug build.

Sheepishly, I stopped the program, changed the configuration from Debug to Release, and solved the performance problem of the day.

Photoshop Elements is one of my more favorite programs for a couple of reasons. One, I like paint programs in general. Two, it doesn't suck. Okay, you can probably point out a million ways that Photoshop could be improved (starting with using the right mouse button for alt-painting), but it's way better than other programs I've used. Paint.NET is cool, but it's missing a lot of features and it's pretty slow, and Paint Shop Pro has had a broken resampler in every version I ever tried. PSE6, on the other hand, is a lot faster and more stable, and aside from some gratuitous skinning, I find the UI hasn't changed much even from Photoshop 2.5.

There's one really irritating problem with it, though.

For some reason, Photoshop Elements 6 has a habit of getting into a state on my machine where it randomly refuses to create or open files. New File? Dialog pops up, click OK, nothing happens. Open a file? Nothing happens. Launch a PSD file? PSE6 opens, blank. The application doesn't hang, it just doesn't do what was asked. This is pretty much the worst failure state imaginable for a program, because not only does it just shrug and not do what you asked, but there isn't even any crash, error, or any other sort of feedback to indicate a possible course of corrective action. I've tried all of the usual solutions, including blowing away the preferences file and fiddling with printers in Windows, to no avail. Reinstalling from scratch didn't help, either. Nor fiddling with scratch drives, monitoring for file I/O errors with FILEMON, or killing and restarting all of the annoying background tasks that PSE6 loads. And weirdly enough, the problem will just go away for a while and come back later. It'd been behaving lately.

Until I tried to scan a document a few minutes ago, at which point it once again decided not to create, open, or scan any images.

Out of a fit of frustration, I launched WinDbg and attached it to Photoshop -- and lo and behold, it started behaving. No need to do anything. I detached the debugger, and it still behaved. I guess the mere threat was enough to put it back in line.

(Needless to say, if anyone knows of an actual solution to this problem that doesn't involve weapons of mass debugging, I'd be grateful. Incidentally, I did also find out that the suggestion to fiddle with printers in Windows isn't as dumb as it sounds, because for some bizarre reason, Photoshop loads the printer driver every time you create or open a document. Bizarre.)

It's been reported to me that an odd thing happens on systems with multiple monitors where VirtualDub will repeatedly lock up the system with display flashes whenever the display panes overlap the second monitor. I saw this in person once, but unfortunately wasn't in a position to debug it, and only recently have been able to recreate it on my debug station. The circumstances where this occurs are:

DirectDraw display mode is enabled (default).

Direct3D/OpenGL display mode is not enabled (default).

Some part of a display pane overlaps a non-primary monitor.

I think another condition is "you have an NVIDIA video card," but a sample size of two with all NVIDIA cards isn't conclusive. Might be XP-only, too.

When this happens, the display will flash and the entire system locks up for about half a second every time VirtualDub tries to repaint the panes. Joy. The workaround is to change the display options in Options > Preferences > Display, and either disable DirectX support (slow) or enable Direct3D mode (preferred).

I traced through this a bit and the entire effect occurs in IDirectDrawSurface3::Blt(). There doesn't appear to be anything special about this call except the destination rectangle. I managed to reproduce this effect on at least two other DirectDraw-based applications, unfortunately, so that means I'm likely in workaround territory rather than fix territory. I also wasn't able to break into the debugger during the call to find out what exactly was locking everything up since the debugger was also frozen, which means resorting to CDB or a remote kernel-mode debugger. If I can't figure any other solution, I'll probably just manually clip the destination and source rectangles to the primary monitor. VirtualDub doesn't currently instantiate a DirectDraw context per adapter and so it can't render on the secondary monitor that way anyway, and clip precision isn't an issue because the DirectDraw runtime already only clips to integer precision -- the lack of subpixel precision is why weird artifacts appear when you drag another window on top of a DirectDraw-based application that's doing a stretchblt.

VirtualDub 1.9.6 is now available for download. This is a stable fix-only release addressing several bugs and also some regressions introduced along the 1.9.x code line. The main intent is to try to fix all of the issues that are preventing people from migrating from 1.8.x to 1.9.x.

The current plan is to release at least one more bug-fix only release (1.9.7), and after that, I have no idea. I have some relatively largish changes pending in another code line, but at this point I don't think I'm going to want to release it as 1.9.8, in case I need to release another fix-only release in the interim... which leads to a problem because I've run out of mid-level version numbers. I'm not going to start a 1.10.x line, because I learned the hard way that everyone reads it as a decimal and thinks .10 < .9. What I should have done was inserted a leading zero so that I'd be at 1.09 instead. I could just jump to 2.0.x instead, but that seems kind of lame since I'm unlikely to have the kinds of changes that would warrant a 2.0 release. Then again, the last time I thought I was going make version 2 a big upgrade I ended up screwing up the code line so badly I had to pull a Vista and restart from the 1.4 code base, so maybe I should just bump the version number.

It seems that a perpetual affliction of .NET WinForms-based applications is slow and flickery repainting. Part of the problem is .NET's insistence on using GDI+, which is not hardware accelerated to any useful extent. That still doesn't explain why so many controls flicker all of the time, even though they're based on Win32 controls that don't have the same problem. Today I hit this problem yet again in a tool, this time with ListView. It drives me absolutely nuts to see a system with a 3GHz Core 2 and a GeForce 8800 take four seconds to redraw a list view that has three columns and a hundred entries when I drag a column, and even worse, flicker the entire time.

Therefore, I had to sit down tonight and figure out how you could make a standard Win32 ListView update so slowly that a 1541 drive could almost keep up with it.

(Caveat: As usual, I do my primary work in XP. I'm too lazy to reboot into Windows 7 right now.)

The way I ended up debugging this involved parallel C++ and C# apps. Both were fairly vanilla apps made using the built-in app wizards, the C++ one containing a dialog with a list view, and the C# one being the same but with a WinForm. Okay, I'll admit that the C++ one was more annoying to write, because programming a Win32 list view directly is a lot of gruntwork. However, out of the box, the C++ app updated much more smoothly and didn't flicker madly. I'll spare you the debugging details -- which include ILDASM, WinDbg, Spy++, two instances of Visual Studio, and tracepoints in x86 assembly while debugging in mixed mode -- but I managed to figure out what was going on. The WinForms ListView is indeed a Win32 ListView with heavy subclassing, but it turns out the poor performance is caused by two bad design decisions on the part of the WinForms team:

The Win32 list view is always in owner draw mode. Always. Even if you don't have OwnerDraw set in the control. Specifically, the WinForms ListView intercepts WM_NOTIFY + NM_CUSTOMDRAW and handles the item painting itself. In doing so, it ends up creating and destroying a lot of GDI+ contexts, and that kills redraw performance, just like we've seen with DataGridView.

In its OnHandleCreated handler, ListView sets the text background color to transparent (ListView_SetTextBkColor(hwnd, CLR_NONE)). As it turns out, this kills the fast path in the Win32 list view code and switches it from incremental painting in opaque mode to a full erase + redraw over the entire control. You can spot the difference if you set a breakpoint on {,,user32}_NtUserRedrawWindow@16.

Both of these are fixable -- the first problem can be fixed by intercepting NM_CUSTOMDRAW and forcing it to return 0, thus restoring the built-in redraw code, and the second one by sending another LVM_SETTEXTBKCOLOR message to restore an opaque background color. With these two fixes, the C# app runs as smoothly as the C++ app. I don't know why the WinForms team chose such poor defaults.

I read a suggestion on a blog that Win32 timer queues should be used instead of timeSetEvent(), so I decided to investigate.

First, a review of what timer queues do. A timer queue is used when you have a bunch of events that need to run at various times. What the timer queue does is maintain a sorted list of timers and repeatedly handles the timer with the nearest deadline. Not only is this more efficient when you have a bunch of timers because you don't have a bunch of pieces of code all maintaining their own timing, but it's also a powerful technique because it can allow you to multiplex a limited timing resource. It's especially good when you have a bunch of low-priority, long-duration timers like UI timers, where you don't want to spend a lot of system resources and precise timing is not necessary.

The classic timer queue API in Windows is SetTimer(). This is mainly intended for UI purposes, and as a result it's both cheap and imprecise. If you're trying to do multimedia timing, SetTimer() is not what you want. It's also a bit annoying to use because you need a message loop and there's no way to pass a (void *) cookie to the callback routine (a common infraction which makes C++ programmers see red). The newer timer API, however, is CreateTimerQueue(). This allows you to create your own timer queue without having to tie it to a message loop, and looks like it would be a good replacement for timeSetEvent().

I've released a new version of Altirra, my Atari 8-bit computer emulator. Version 1.3 contains a large number of compatibility fixes and runs a number of titles that previously blew up under 1.2. I'd like to thank Mclane and breaker for an astounding amount of compatibility testing in a comment thread of epic proportions, which tracked down many, many games and programs that did not run correctly and helped immensely in identifying and ironing out bugs.

One of the weirder aspects of doing work on this project is that it's probably the closest I've come to doing Test Driven Development. TDD is not something I'm generally sold on because I think there are lots of types of modules for which tests are not the appropriate way to ground development. In this case, though, I need to conform to a behavior that is (a) rigidly defined and testable and (b) frequently unknown. What I often end up doing is writing a unit test to determine the behavior on a real 800XL, and then tweaking the emulation code until the test passes. I have to do things this way because the emulation code is now operating at a level where it's very easy to break some programs while trying to fix others since some programs are sensitive to single-cycle timing deviations.

Another thing that's becoming abundantly clear is how difficult it would have been for Atari to create a more powerful machine with a high level of backwards compatibility. I've been finding an increasing number of programs that happen to work but have extremely tight timing margins or simply have outright bugs. I've seen programs that:

Used uninitialized memory.

Blew up if VCOUNT incremented one cycle off from when it was supposed to.

Crashed with disk acceleration enabled because they didn't wait for a new display list to take effect and died when ANTIC started firing random interrupts.

Enabled interrupts and switched memory banks in the wrong order and relied on a two-cycle interrupt delay in the hardware.

Used a portion of kernel ROM as an encryption key.

So, if you're wondering why your powerful next-gen console can't reliably run your last-gen games, now you have some idea why. When you have stable and consistent hardware, it's easier for people to unintentially write code that relies on unspecified behavior to an extreme level.

Oh, and I'll say it again: cassette tape is vile. I hated it growing up and I hate it even more now that I'm trying to emulate it. You should offer thanks every night that you no longer have to wait minutes for a program to load off of tape.