JPEG compression – Process and Implementation

Motivation

The human eye has several limitations – and one of these limitations that JPEG compression takes advantage of is the fact that the brain-eye system cannot see extremely fine detail. This limitation is even more obvious for colors than for grey-scale. Therefore, color information in JPEG is decimated (partially dropped or averaged), and hence color images typically can be compressed to a greater degree than grey-scale images.

JPEG uses the YCbCr color space where Y’ is the luma component (the brightness), and Cb and Cr are the chrominance signals (red and blue). Next we have chroma subsampling. The chroma subsampling scheme “4:4:4” means that no subsampling is applied, and indicates that each pixel’s Y, Cb, and Cr values are sent (four for each of Y, Cb, and Cr). JPEG typically uses the scheme “4:2:0” which subsamples both in the horizontal and vertical dimensions by a factor of two, and significantly reduces the file size/data rate.

Let’s take a look at how JPEG compression actually works.

The 5 steps of JPEG compression

  1. Transform the image from the RGB color space to the YCbCr color space and subsample color using the 4:2:0 chroma subsampling scheme.
  2. Divide each image into 8 x 8 blocks and apply he 2D Discrete Cosign Transform on each block. For a more detailed overview of the DCT, please refer to here. Basically the DCT turns the spatial domain of an image into its frequency domain, where the spatial domain refers to the intensity of every channel in each pixel, and the frequency domain refers to the change of intensity from one pixel to the next.
  3. The DCT coefficients are divided by the 8 x 8 quantization matrix (and rounded). The quantization matrix values have been selected to maximize the compression ratio and minimize perceptual losses in JPEG images. Users have the option to change the quality factor which modifies the values of the quantization matrix.
  4. Applying quantization can result in many zeros in the quantization matrix. Run-length coding is therefore useful in turning these values into sets {# zeroes-to-skip, next nonzero value). RLC is even more effective when we use an addressing scheme like a zigzag scan, making it most likely to hit a long run of zeros since typically the right bottom part of the matrix is filled with zeros: a zigzag scan turns the 8 x 8 matrix into a 64-vector. Most image blocks tend to have small high-spatial-frequency components, which are zeroed out by quantization. Quantization reduces the file size by a significant margin; the next and final step is Huffman coding which reduces the file size even more.
  5. The DC (constant component) and AC (alternating component) coefficients finally undergo an entropy coding step. Huffman coding is generally used in JPEG as the entropy coding method.

Huffman Coding

Entropy coding is the identification of often-occurring symbols in the data stream as good candidates for short code words in the compressed bit stream.One such coding method is Huffman coding – a variable-length coding scheme.

Let’s take a quick look at how Huffman coding works for text:

String we want to encode: “to be or not to be”. This string contains 18 characters, and multiplying them by 8bits (per character), the string results in a size of 144bits.

  1. Initialization: put all symbols in a list sorted according to their frequency count
  2. Repeat until the list has only one symbol left
  3. From the list, pick two symbols with the lowest frequency counts. Form a subtree that has these two symbols as child nodes and create a parent node for them
  4. Assign the sum of the children’s frequency counts to the parent. Do this until no symbols are left
  5. And finally, assign a code word for each leaf based on the path from the root

So let’s do this for the string “to be or not to be”

Step 1 and 2:

space = 5, o = 4, t = 3, b = 2, e = 2, r = 1, n = 1

Step 3, 4, and 5:

This will result in the following tree:

huffman_to_be_or_not_to_be-svg

Now let us write the encoded string:

to be or not to be = 101 01 11 001 100 11 01 0001 11 0000 01 101 11 101 01 11 001 100

The Huffman encoded bit stream contains 47bits, whereas the original uncompressed bit stream contains 144bits. That is only 47 / 144 = 32.6% of the original text, which means a compression ratio of just over 67%.

This same process applies to the 8×8 blocks of the image being compressed. So to be able to decompress the image, the code table is needed (containing the letter/color, its frequency and bit code); so the code table and the Huffman encoded bit stream is transmitted.

Implementation

If you’re interested, you can clone a project I did last year from Github to learn about how some parts of JPEG compression work behind the scenes. The main goal of this project was to play around with the quantization matrix, and see the effects it has on image compression rates/quality with the ability to modify individual values of the matrix.

Github repo:

https://github.com/JasonTTT/JPEGCompression.git

bitmaptojpeg JPEG compression – quantization UI

How it works:

  • Load a bitmap image from the file menu (one of the samples included in repo) and it will appear in the Original Image column.
  • The quantization matrix is populated with default JPEG values. You can modify these as needed (dont forget to click Save Values if you do). To compress, click the Compress button between the two image columns.
  • Zoom in/out to see the differences between the original and compressed images. Notice how the compression ratio is higher for color images than grey-scale as mentioned previously.
  • Click on the Constant – 1 button; this will populate both luminance and chrominance matrices with the value 1 which means the quantization step wont have any effect. Now click on Constant – 255; this populates the matrices with the value 255. This will give the greatest compression, however you’ll see quality taking a hit.

Feel free to experiment with the values – different values in the matrices will have different effects.

Disclaimer: the application is not perfect – it was a weekend experiment i worked on so it could be buggy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s