Fast JPEG codec for NVIDIA GPUs

We have created fast JPEG codec based on NVIDIA CUDA technology. CUDA JPEG codec developed by Fastvideo is a blend of strict compliance with standards and shocking encoding and decoding speed comparing with the fastest existing commercial solutions. This is full, performance-oriented implementation of Baseline JPEG. We got ultra fast JPEG compression and decompression on the GPU due to full parallel implementation of Baseline JPEG algorithm. Our CUDA JPEG codec is the fastest in comparison with the best commercial multithreaded JPEG codecs for multicore CPUs.

Why JPEG on CUDA could be so fast?

We have succeeded to make parallel all stages of JPEG algorithm including entropy encoding and decoding. There was a widespread opinion that RLE and Huffman algorithms could be only serial. In our solution RLE and Huffman algorithms are not bottlenecks anymore and they are fully parallel. Now we don't off-load anything from GPU to CPU to make JPEG codec faster. CUDA JPEG codec is extremely fast and it's working completely on GPU.

There are a lot of scientific papers about JPEG compression on CUDA, where authors try to speedup baseline DCT module. The idea of parallel computations on CUDA leads to that task immediately, but this is just a small part of the whole solution for CUDA acceleration of JPEG algorithm. Parallel computing could be applied to all stages both of JPEG encoder and JPEG decoder. Image partitioning to a big amount of 8×8 blocks is a key feature to speedup JPEG codec on NVIDIA GPU. The most difficult part of JPEG algorithm is entropy codec, and we've accomplished that task as well. Our solution for fast JPEG on CUDA is working on GPU and we've accelerated all constituent parts of JPEG. This is actually the main idea of image processing speedup on CUDA: we have to create CUDA-based version for each algorithm that we have in our pipeline. And all our software was impelmented according to that approach.

Now we need just 0.51 ms for Baseline JPEG encoding of 24-bit color image with 4K resolution 3840 × 2160, JPEG quality 90% and subsampling 4:2:0 (it corresponds to image compression ratio ~10:1). We have chosen the above JPEG encoding parameters because they correspond to so called "visually lossless" compression.

These are the latest performance benchmarks for encoding of 2K and 4K images, 24-bit (JPEG compression on GPU, without DeviceIO latency, single image mode, no batch, no streaming) on NVIDIA GeForce GTX 1080 TI and Quadro P6000:

The above results are much faster than benchmarks of libjpeg-turbo and turbojpeg on CPU. Even if we take into account host to device and device to host transfers, the performance of CUDA JPEG codec will be much higher than libjpeg-turbo. More results for CUDA performance measurements you can download here.

Licensing for Fast JPEG Codec

We license Fast JPEG Codec and other components of Fastvideo Image & Video Processing SDK to software developers, camera manufacturers and resellers, internet providers, system integrators, etc. Our SDK is utilized in wide range of imaging applications. Demo SDK, documentation, licensing info and quotation are available upon request. We are also offering custom software design according to agreed specification. If you need to get significant speedup on GPU for your image processing application, don't hesitate to contact us.