The coding efficiency of the H.264/AVC standard makes the decoding process computationally demanding. This has
limited the availability of cost-effective, high-performance solutions. Modern computers are typically equipped with
powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations. These GPUs can be
addressed by means of a 3-D graphics API such as Microsoft Direct3D or OpenGL, using programmable shaders as
generic processing units for vector data. The new CUDA (Compute Unified Device Architecture) platform of NVIDIA
provides a straightforward way to address the GPU directly, without the need for a 3-D graphics API in the middle. In
CUDA, a compiler generates executable code from C code with specific modifiers that determine the execution model.
This paper first presents an own-developed H.264/AVC renderer, which is capable of executing motion compensation
(MC), reconstruction, and Color Space Conversion (CSC) entirely on the GPU. To steer the GPU, Direct3D combined
with programmable pixel and vertex shaders is used. Next, we also present a GPU-enabled decoder utilizing the new
CUDA architecture from NVIDIA. This decoder performs MC, reconstruction, and CSC on the GPU as well. Our results
compare both GPU-enabled decoders, as well as a CPU-only decoder in terms of speed, complexity, and CPU
requirements. Our measurements show that a significant speedup is possible, relative to a CPU-only solution. As an
example, real-time playback of high-definition video (1080p) was achieved with our Direct3D and CUDA-based
H.264/AVC renderers.