DCT based Video Encoding Process

https://stackoverflow.com/questions/14398229

16-01-2022
|

Question

I am having some issues that I am hoping you will be able to clarify. I have self taught myself a video encoding process similar to Mpeg2. The process is as follows:

Split an RGBA image into 4 separate channel data memory blocks. so an array of all R values, a separate array of G values etc.
take the array and grab a block of 8x8 pixel data, to transform it using the Discrete Cosine Transform (DCT).
Quantize this 8x8 block using a pre-calculated quantization matrix.
Zigzag encode the output of the quantization step. So I should get a trail of consecutive numbers.
Run Length Encode (RLE) the output from the zigzag algorithm.
Huffman Code the data after the RLE stage. Using substitution of values from a pre-computed huffman table.
Go back to step 2 and repeat until all the channels data has been encoded
Go back to step 2 and repeat for each channel

First question is do I need to convert the RGBA values to YUV+A (YCbCr+A) values for the process to work or can it continue using RGBA? I ask as the RGBA->YUVA conversion is a heavy workload that I would like to avoid if possible.

Next question. I am wondering should the RLE store runs for just 0's or can that be extended to all the values in the array? See examples below:

 440000000111 == [2,4][7,0][3,1]   // RLE for all values
 or
 440000000111 == 44[7,0]111        // RLE for 0's only

The final question is what would a single symbol be in regard to the huffman stage? would a symbol to be replaced be a value like 2 or 4, or would a symbol be the Run-level pair [2,4] for example.

Thanks for taking the time to read and help me out here. I have read many papers and watched many youtube videos, which have aided my understanding of the individual algorithms but not how they all link to together to form the encoding process in code.

Solution

(this seems more like JPEG than MPEG-2 - video formats are more about compressing differences between frames, rather than just image compression)

If you work in RGB rather than YUV, you're probably not going to get the same compression ratio and/or quality, but you can do that if you want. Colour-space conversion is hardly a heavy workload compared to the rest of the algorithm.

Typically in this sort of application you RLE the zeros, because that's the element that you get a lot of repetitions of (and hopefully also a good number at the end of each block which can be replaced with a single marker value), whereas other coefficients are not so repetitive but if you expect repetitions of other values, I guess YMMV.

And yes, you can encode the RLE pairs as single symbols in the huffman encoding.

OTHER TIPS

1) Yes you'll want to convert to YUV... to achieve higher compression ratios, you need to take advantage of the human eye's ability to "overlook" significant loss in color. Typically, you'll keep your Y plane the same resolution (presumably the A plane as well), but downsample the U and V planes by 2x2. E.g. if you're doing 640x480, the Y is 640x480 and the U and V planes are 320x240. Also, you might choose different quantization for the U/V planes. The cost for this conversion is small compared to DCT or DFT.

2) You don't have to RLE it, you could just Huffman Code it directly.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow