What this routine does is that it uses each index to look up a premultiplied kernel, and adds that to a short output window (8 samples). The output stream has 4x rate compared to the input stream. In a real routine the kernels would typically be a bit longer, but an example of where you might use something like this is to simultaneously upsample and convert a row of pixels or a block of audio through a non-linear curve.

If we look at the output of the VC++ x86 compiler, the result is decent:

I recently went through converting Direct3D 9 based portions of VirtualDub to optionally use Direct3D 9Ex (formerly 9.L, and sometimes called Direct3D9Ex depending on the documentation writer's mood). This is an extended version of the Direct3D 9 API that enables additional functionality possible with the DirectX Graphics Infrastructure (DXGI) in Vista/7. Here's what I can say about it:

Lost devices are largely gone, and locked workstations / remote desktop no longer nukes 3D acceleration. As a long time DirectX programmer who's dealt with this nasty situation since DirectDraw and who's tried using D3D9 for off-screen rendering, I can't adequately express my joy at this.

...BUT:

There is a little land mine stuck within the docs for D3DPOOL that need to be promoted to the main Direct3D 9Ex page with huge flashing red lights, which is that managed pool resources are not supported. I ended up having to go through all of my 3D code and rewrite any places that were doing direct lock (map) uploads and rewrite them to go through a SYSTEMMEM staging resource. Not fun. I can't imagine how nasty this would be for a full game engine (assuming anyone would bother). The note in the docs is misleading, too: it says that IDirect3DDevice9Ex doesn't support managed pool, but it actually means all methods on a 9Ex device -- including all methods inherited from IDirect3DDevice9.

FLIPEX is a new windowed present mode available in Windows 7, which optimizes image transfer between the application and the DWM and also offers additional timing capabilities. Unfortunately, information on its limitations and requirements is scattered all over the SDK:

You need to associate the target window with a FLIPEX swap chain or device at the time of creation, instead of providing an HWND override to Present(). Easy for me, since I already extracted out a dedicated child window to work around Vista wonkiness.

You can't use a source or destination subrect when presenting with FLIPEX. I had to bypass my code that was rounding up swap chain sizes to try to reduce reallocations.

Really annoying: there is no analogous PresentEx() method in IDirect3DSwapChain9Ex. This means that the DONOTFLIP and FORCE_IMMEDIATE flags cannot be used and makes FLIPEX largely useless for additional swap chains. I tested this on the MS sample app, and the extra flags aren't allowed by IDirect3DSwapChain9::Present(). I tried to find a way to backtrack to the DXGI swap chain as that might have been a viable workaround, but unfortunately I couldn't find one. The only way to really use FLIPEX appears to be to use the implicit swap chain, which sucks since the only way you can resize that is ResetEx() and you only get one swap chain.

When the DWM is disabled, FLIPEX still has the same suboptimal behavior as COPY where a vsynced present (INTERVAL_ONE) will block and poll for up to a full frame time.

Once I got it working, I wasn't able to see a CPU performance advantage over COPY at 60 fps. There may be an advantage on the GPU side but I'm not set up to measure that system-wide.

Long story short, if you've got a small amount of code to do GPU accelerated rendering, modifying it to use 9Ex to avoid the mode switch limitations isn't too bad, but I wouldn't convert a larger engine just for that. FLIPEX is more dubious and limited in advantage; I can see where it would help with a video player, but with a more complex display or rendering setup involved it's more of a pain to use. For the work involved and with VirtualDub's display code I'd be tempted to bypass 9Ex/FLIPEX and just try going to Direct3D 10.1 in 10level9 mode and DXGI, which has a more flexible swap chain interface.