(this seems more like JPEG than MPEG-2 - video formats are more about compressing differences between frames, rather than just image compression)
If you work in RGB rather than YUV, you're probably not going to get the same compression ratio and/or quality, but you can do that if you want. Colour-space conversion is hardly a heavy workload compared to the rest of the algorithm.
Typically in this sort of application you RLE the zeros, because that's the element that you get a lot of repetitions of (and hopefully also a good number at the end of each block which can be replaced with a single marker value), whereas other coefficients are not so repetitive but if you expect repetitions of other values, I guess YMMV.
And yes, you can encode the RLE pairs as single symbols in the huffman encoding.