We ran the DXVA Checker benchmark on all the cards, but our graphing engine allows us to present only four series in each graph. This meant that we had to choose between the GDDR5 based AMD 6450 and the DDR3 based MSI 6450. Keeping in mind our focus on passively cooled GPUs, we went with the latter. It was observed that the 'No VPP (Video Post Processing)' frame rates were similar for both the candidates. However, as post processing algorithms were enabled, the MSI 6450 began to perform a bit worse than the AMD 6450. We will analyze the probable cause later. We were able to get DXVA2 acceleration with the EVR renderer for all codecs except the MPEG-4 variants.

First, we look at a 1080p H.264 clip. The MPC Video Decoder v1.5.2.3134 was able to playback the clip without issues on all the GPUs. There were a couple of surprises in store when the DXVA Checker benchmark (as described in the previous section) was run.

While the GT 430 was unable to reach the magical 60 fps benchmark figure (I expect any GPU worth its salt to be able to decode 1080p60 H264 clips), the GT 520 sprang a surprise with some insane decoding speeds. Even considering the fact that the GT 520 took shortcuts by skimping on the post processing, it comfortably beats every other GPU in the race. The 430's benchmark result was even more puzzling, considering the fact that all the 1080p60 AVCHD and re-encoded broadcast clips that we threw at it played back flawlessly. We talked to NVIDIA about this, and it looks like the culprit in this case was the bitrate. Our sample was a 40 Mbps clip at 1080p30. At 60 fps, the VPU engine would have had to process a sample at 80 Mbps, and apparently, the VP4 engine in the GT 430 is simply not capable of that. We are willing to cut NVIDIA some slack here, because I have personally not seen any real 1080p60 content at 80 Mbps. We will cover both of the above aspects in detail in the next section.

With the exception of the 6450, we find that enabling various post-processing options doesn't bring down the decode frame rate. This shows that the latency of the post processing steps is completely hidden by the time spent in the UVD / VPU engines to obtain the decoded frame. For the 6450, we infer that the lower core clock for the stream processors slows down the post processing steps a bit too much.

For the 1080p VC-1 clip, we again use MPC Video Decoder v1.5.2.3134 for flawless play back.

We find that the NVIDIA GPUs hide their post processing latency in the time taken by the VPU engine. However, the 6570 shows a gradual decline in the throughput as various options are enabled. The decline is not as alarming as the 6450's, and manages to comfortably stay above 60 fps.

VLD acceleration for MPEG-2 was only recently introduced in the UVD 3 engine by AMD. The Microsoft DTV-DVD Video Decoder is able to provide DXVA2 acceleration for MPEG-2 clips.

It is not clear why turning on deinterlacing / cadence detection should affect the throughput of the decode of the progressive clip, but that is what we observe for all the candidates except the 6570. Compared to VC-1 and H.264 decoding which decided the throughput of the video pipeline, MPEG-2 is much easier on the UVD / VPU engine. This is reflected in the fact that the video post processing brings down the throughput quite a bit on all the GPUs.

As expected, deinterlacing definitely kicks in to lessen the throughput of the frames. Unlike the 1080p H.264 decode performance, we find that all the GPUs are now limited by how fast the post-processing can be done. This makes sense, since the UVD/VPU engine needs to operate for only half the usual horizontal resolution for interlaced content. Note that the 'frames per second' figure presented for the interlaced streams is actually 'fields per second' (a 1080i clip showing 29.97 fps with MediaInfo actually has 59.94 fields per second).

The interlaced MPEG-2 performance is as below:

Results are very similar to what we got for the interlaced H.264 clip. One can conclude that interlaced clips spend more time getting post-processed compared to the progressive clips, but that is hardly surprising.

We had noted earlier that DXVA2 / EVR wasn't enabled for interlaced VC-1 streams on any of the GPUs. However, with the checkactivate.dll hack (described in the LAV Splitter section), we were able to make Arcsoft Video Decoder appear in the list of codecs when the 'Check DirectShow / MediaFoundation Decoders' was used for interlaced VC-1 clips. Though it wasn't explicitly indicated that the support was DXVA2 using EVR, we did find that playing back the stream using EVR consumed almost nil CPU resources and kept the GPU / VPU engine quite busy. Presented below is the interlaced VC-1 performance using the Arcsoft Video Decoder in Total Media Theater v5.0.187

The takeaway from this section is that cards which run too close to the 60 fps limit with all post processing steps enabled should be avoided, unless there are some convincing reasons for that. The results also need to be taken in conjunction with the day-to-day usage experience. As mentioned before, the 6450 fails on both counts. The GT 520 fails the day-to-day usage test (deinterlacing performance). The GT 430 gets a recommendation despite weighing in at less than 60 fps for the 1080p H.264 stress stream. The 6570 is the hands down winner in this section. It is able to carry out all the post processing steps even when it is forced to process very stressful video streams.

Post Your Comment

70 Comments

A great review. Provides all the answers one could wish for and even gives some further hints.I sure hope you have something like this lined up for llano.

If I may suggest a couple or three things:Perhaps you should also mention reclock - it will solve most 23.976 and similar problems... It's not like many will detect that the video is running 1/24000th faster. Plus it's insanely easy to use.I understand you couldn't just post full blown images for space problems, but those thumbnails require too much work too. Is it possible to display a popup of sorts when one mouse-overs those thumbnails?Also a vertical line showing 60FPS in those DXVA tests would be great :)Reply

What might have been a nice option is to see what sound levels the cards produced. Even it was only for the GT430 and the HD6570. I know that the decibels can differ between manufacturers but it would have been nice!For the rest a very nice detailed review between HTPC cards. I was deciding which card to buy so this helped a great deal! I was only looking between the HD6450 and the HD6570 but the GT430 is a better option than the HD6450.Reply

I think I did not use the right word, as I meant the levels of decibel the fan of the cards produce and not the audio too and through speakers.All reviewed cards have a fan on them and since most of the HTPC setups are in the living room it would have been nice to know which of the cards are most silent. Reply

Though we considered cards with fans in this review, we made it a point to note that the same configuration (GPU model + DRAM bus width + operating frequencies) can be obtained with passive cooling from other vendors.

For example, the 6570 has a passively cooled model from HIS with the same config and Zotac has a passively cooled 430 too. Other vendors have also demonstrated passively cooled models in Computex.Reply

The fact that none of AMD, Intel and Nvidia can lock onto to the correct frame rates is unforgiveable. It is not as though these frame rates have changed over the last 6 months. It should not be necessary to be an advanced HTPC user and delve into custom creation of frame rates.

I really hope that the representatives of AMD, Intel and NVidia are hanging their heads in shame at such basic errors - sadly I doubt they care.Reply

The mistake is rather with Microsoft. Video playback speed should be adapted to the refresh rate of the grafx card. There is a software called Reclock doing that. Then, for example 23,996 Hz can be run with a monitor refresh rate of n times 24 Hz. (The same with audio, because bit-perfect transmission only works with synchronization.) In the end and for most sources, the RAMDAC needed only (multiples of) 24, 25 and 30 Hz. In any system, one of its parts should be the clock master, while the other parts serve.Reply