Building a JPEG Compression Engine

This is the first out of a new series of DIY posts which I intend to publish, which will go into the workings of a specific application, library or algorithm and will develop something interesting with them. Let’s get started with OpenCV and a JPEG compression engine. Get yourself a cup of coffee and some biscuits, this is going to be a long post!

My intent to build a JPEG encoder/decoder is not as much to understand the framing format as it is to optimize the signal processing and computationally intensive functions involved in the compression itself. For this reason, I am using OpenCV functions to read and write images, which abstracts the details of the byte-stream from the programmer.

The final goal of this project is to demonstrate a significant speedup of the algorithm when implemented on a sufficiently parallelized hardware architecture, such as a Field Programmable Gate Array (FPGA) or a Graphics Processing Unit (GPU). This code will be ported to C, then implemented in Vivado_HLS (a high level synthesis tool for generating RTL code from high level algorithmic code) and finally demonstrated on an FPGA board.

Installing OpenCV

OpenCV is a computer vision library that has bindings for C, C++ and Python. It is extremely powerful and easy to use. However, it cannot be used on memory-handicapped or processor-handicapped systems such as low-power embedded systems, since it requires a Linux-based OS or Windows to run. On such systems, the algorithms in the OpenCV library can be ported depending on application requirements.

For installing OpenCV on a Linux-based operating system, like Ubuntu or Debian, I prefer to follow the instructions provided here. Additional packages and utilities that are extremely useful are included in the Scientific Python (SciPy) stack, which can be obtained from their official website.

JPEG Format

The JPEG format is a lossy method of compressing digital images, wherein the degree of compression can be adjusted, allowing a customizable tradeoff between storage size and image quality. Any image can be represented in terms of 3 components: red, green and blue channels. Each pixel in the image uses these three 8-bit values to represent its colour information. Suppose we considered a full HD image (1920*1080). This corresponds to 1920*1080 pixels * 24 bits/pixel = 49766400 bits = 6.22 MB. Reducing this size to below a megabyte requires intelligent compression techniques.

First of all, the RGB channels are more or less equally perceived by our eyes. This suggests the use of a different colour space representation which transforms the image so that it allows for greater compression. For this purpose, we transform the RGB image into a YCrCb image, which comprises of two components – luminance (Y) and chrominance (Cr/Cb). Our eyes are highly sensitive to luminance, but not very sensitive to high frequency variations in chrominance. This allows JPEG to support something called chroma sub-sampling, which essentially smooths or averages over the chrominance components, thus reducing the number of values required to represent them.

To further attenuate the high frequency components in the image, we resort to quantization.It reduces the amount of precision used to represent certain components of the image, without causing a significant (to human eyes) loss in quality. But how do we identify these components? The answer is – by using a special two-dimensional image transformation called the Discrete Cosine Transform (DCT).

Discussion of the Discrete Cosine Transform (DCT)

The DCT was invented by N. Ahmed, T. Natarajan, and K.R. Rao in 1974. Since then, it has been adopted in several video/audio/image coding standards, including JPEG, MPEG and H.26X series. A 2D DCT transforms an image into a linear combination of a set of image basis functions. For an 8×8 image, a DCT will provide the coefficients for the basis functions (images) shown below.

[Src: Wikipedia] The DCT transforms an 8×8 block of input values to a linear combination of these 64 patterns. These patterns are the 2D DCT basis functions, and the output values are the transform coefficients.Since the higher order variations in this transformed domain (lower-right patterns) are less discernible to our eyes, they can be discarded to save storage space. This is done using a special quantization matrix, which reduces the contribution of higher frequency terms more than lower frequency terms. As a result, the matrix tends to become upper left triangular, since most of the lower-right terms go to zero after quantization.

Let’s write some code!

OK. Enough theory. If anything new comes up, I’ll explain it as I go along. Let’s dive into the Python code! Don’t worry if you are new to computer vision using the OpenCV library, it is easy to pick up and very useful in a wide variety of applications.

The code snippet above imports the OpenCV bindings for Python (cv2) and the PyLab module, which includes various tools necessary for scientific computations (this will be installed with the SciPy stack mentioned above). The show function takes in an input image file, shows it onscreen until any key is pressed, and then closes all windows. It is useful for displaying the status of the image at any point during processing.

The code snippet above will take in an input image and pad it appropriately, since the DCT takes in only 8×8 blocks of the image, and hence, the 2-D image needs to be made 8M x 8N, where M and N are positive integers. This means that both the rows and columns need to be padded to increase their size to the nearest multiple of 8. This is done quite naively by repeating the last row/column until the above condition is satisfied.

#Function to downsample the chrominance channel
def downsample(C):
C=C.astype('uint16')
Cout=np.zeros((8,8),dtype='int16')
for i in range(8):
for j in range(8):
Cout[i][j]=(C[2*i][2*j]+C[2*i+1][2*j]+C[2*i,2*j+1]+C[2*i+1][2*j+1])/4
return Cout
def al(x):
if(x==0):
return 1.0/sqrt(2.0)
else:
return 1.0
cosMat=np.zeros((8,8));
def evalcos():
for i in range(8):
for j in range(8):
cosMat[i][j]=cos(i*pi*(2*j+1)/16.0)
def singleGUV(g,u,v,inv=0):
G=0
for x in range(8):
for y in range(8):
if inv==0:
G+=0.25*al(u)*al(v)*g[x][y]*cosMat[u][x]*cosMat[v][y]
else:
G+=0.25*al(x)*al(y)*g[x][y]*cosMat[x][u]*cosMat[y][v]
return G
def shift_128(subimg):
#first of all, the input array is a numpy array of type "uint8"
#this needs to be converted to int16 or else the shifting to center around 0 will fail
subimg = subimg.astype('int16')
subimg=subimg-128
return subimg
#Function to perform a 2D DCT on the 8x8 images
def DCT_8x8_2D(subimg,shift=1, inv=0):
if shift==1:
subimg=shift_128(subimg)
#now the 2D DCT can be obtained using the formula from en.wikipedia.org/wiki/JPEG
G = np.zeros((8,8),dtype='float')
for u in range(8):
for v in range(8):
#calculate G(u,v) as follows:
G[u][v]=singleGUV(subimg,u,v,inv)
return G

Phew! That seems like a lot of code! But wait. It is broken up into simple, logical, easy-to-understand chunks. The downsample function performs the averaging operation on the chrominance channel, as described earlier. The next 5 functions are used to perform the DCT and the Inverse-DCT (IDCT). The evalcos function is run only once, but it evaluates the necessary cosine terms and stores it into a look-up table, allowing quicker access by subsequent operations. The singleGUV function calculates the forward/inverse DCT for a single point of the 8×8 matrix, and the DCT_8x8_2D function runs this function over all the points in the matrix. In case of the forward DCT, a shifting operation is performed to reduce the dynamic range requirements of the DCT. The formula for the forward and inverse DCTs can be found on Wikipedia, which is an excellent resource for information about JPEG encoding and decoding.

The above code performs the quantization of the DCT matrix created by the function DCT_8x8_2D. The quality factor, which is a number between 1 and 100 (1 being extremely poor quality and lowest file size to 100 being very high quality and largest file size), can be modified to obtain different quantization matrices. The function above will return a matrix that contains the entry for entry product of the quantized matrix with the quantization values. This is the last stage of the encoding procedure combined with the first stage of the decoding procedure.

im=cv2.imread("D:/jpg.jpg")
im=cv2.resize(im,(0,0),fx=0.5,fy=0.5)
rows,cols,ch = im.shape
#Convert the colour space from RGB to YCrCb
YCrCb=cv2.cvtColor(im,cv2.COLOR_RGB2YCR_CB)
Y=YCrCb[...,0]; Cr=YCrCb[...,1]; Cb=YCrCb[...,2];
#Pad the image such that it consists of only 16x16 blocks
YCrCb=imgPad(YCrCb)
evalcos() #evaluate the cosine matrix used for the DCT operation

The code above simply opens an image, resizes it by scaling both dimensions by 0.5 (so as to speed up computation time), converts the image colour space to YCrCb (by using an in- built OpenCV function) and pads the image using the function imgPad detailed above. Finally, the cosine matrix required for the DCT operation is evaluated once, for future use.

The above code seems extremely verbose, but all it is doing is running over all 8×8 blocks in the YCrCb image, performing the DCT and quantization step (with a quality factor of 10), then the inverse DCT step, and storing back the value to the YCrCb matrix. I have left out the downsampling stage, but it shouldn’t make a significant difference in the image that can be observed at the end. The results are as shown below.

Original Image [Source: Wikipedia]Output image [Q = 10]It can be seen that there are noticeable artifacts in the output image, and the bottom and right edges are slightly “stretched”. The stretching is due to the image padding performed earlier, and there may be better methods of performing the same. Artifacts are to be expected, but some of these may be removed when performing the steps in a slightly different manner (perhaps by including chrominance sub-sampling) or by a low pass filtering (averaging) of the output image.

This post serves to illustrate the use of OpenCV for converting an image into a matrix representation that can be manipulated in Python, to allow us to apply the JPEG compression algorithms to reduce the image file size. Although I have not emphasized the details of the encoding techniques here, an extremely detailed walk-through can be found here. The JPEGsnoop tool by the same author is quite useful for extracting details about the JPEG framing for any .jpg image file.

Just to end on an interesting note…the bulk of this whole program can be compressed into a single line!! The code below is equivalent to the whole program we wrote!