Question

So I'm writing a color detection application using an AR Drone. The drone sends my python/opencv socket server an image from its camera in YUV420p format.

What I do to access the image as an opencv IPLImage is the following (and yes this is inefficient but I didn't / don't want to have to write new conversion tools myself):

  1. Save the yuv image to a file (some_image.yuv)
  2. subprocess.call(insert ffmpeg call here)
  3. Read the resultant file (bmp, png, it doesn't matter) back in using cv.LoadImage

My problem right now is a very noticable color shift. I'm waving a red felt sheet in these pictures. The first one shows a heavy yellow tint. The second isn't as bad but is very rare -- mostly when I have the red sheet it's heavily tinted.

I'm wondering both of these things:

  1. if there's either a better way to do this
  2. If the color tinting issue can be resolved

My ffmpeg conversion line looks something like

ffmpeg -s 640x480 -vcodec rawvideo -f rawvideo -pix_fmt yuv420p -i image.yuv -vcodec bmp -f image2 output.bmp

I've also tried:

ffmpeg -f rawvideo -s 640x480 -pix_fmt yuv420p -vf colormatrix=bt709:bt601 -i image.yuv -f image -vcodec png output.png

The color shift is always there, unfortunately!

The color shift is my big problem right now as I later convert the image to HSV and use thresholding to choose a color range that works for me.

Was it helpful?

Solution

This approach seems to work for me:

$ ffmpeg -s 352x288 -i foreman_cif_frame_0.yuv f.png
ffmpeg version N-46810-g7750c48 Copyright (c) 2000-2012 the FFmpeg developers
  built on Apr 21 2013 11:12:24 with gcc 4.6 (Ubuntu/Linaro 4.6.3-1ubuntu5)
  configuration: --enable-gpl --enable-libx264 --enable-libmp3lame
  libavutil      52.  7.100 / 52.  7.100
  libavcodec     54. 71.100 / 54. 71.100
  libavformat    54. 36.100 / 54. 36.100
  libavdevice    54.  3.100 / 54.  3.100
  libavfilter     3. 23.100 /  3. 23.100
  libswscale      2.  1.102 /  2.  1.102
  libswresample   0. 16.100 /  0. 16.100
  libpostproc    52.  1.100 / 52.  1.100
[rawvideo @ 0x18a1320] Estimating duration from bitrate, this may be inaccurate
Input #0, rawvideo, from 'foreman_cif_frame_0.yuv':
  Duration: N/A, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 352x288, 25 tbr, 25 tbn, 25 tbc
Output #0, image2, to 'f.png':
  Metadata:
    encoder         : Lavf54.36.100
    Stream #0:0: Video: png, rgb24, 352x288, q=2-31, 200 kb/s, 90k tbn, 25 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (rawvideo -> png)
Press [q] to stop, [?] for help
frame=    1 fps=0.0 q=0.0 Lsize=       0kB time=00:00:00.04 bitrate=   0.0kbits/s    
video:201kB audio:0kB subtitle:0 global headers:0kB muxing overhead -100.000000%

output:

Famous forman as png

Another approach is using the mighty Imagemagick

$ convert -size 352x288 -depth 8 foreman_cif_frame_0.yuv f2.png 

Interestingly, ffmpeg and imagemagick do not return identical results:

$ compare -compose src f.png f2.png diff.png

Result:

enter image description here

update Too bad. The only reasonable explanation then is that PIL is borked (it has some peculiarities when it comes to YCbCr handling; many Q here on SO about that). As you can see from my post, if the input is correct YCbCr, the the output is OK!

If I read your Q correct, you already receive the data in YV12 format. Input is in VGA so the following code splits the separate planes (Y, Cb, Cr) into own variables:

# Here I'm assuming you get the data from the drone into parameter raw
# 1 frame contains 640*480*3/2 = 460800 bytes
import numpy as np

# turn raw into a numpy array
raw = np.array(raw)

# calculate where each plane starts and stops
wh = 640 * 480
p = (0, wh, wh, wh/4*5, wh/4*5, wh/2*3)

# Now use slizing to extract the different planes
yy = np.empty(640*480, dtype=np.uint8)
cb = np.empty(640*480/4, dtype=np.uint8)
cb = np.empty(640*480/4, dtype=np.uint8)

yy = raw[p[0]:p[1]]
cb = raw[p[2]:p[3]]
cr = raw[p[4]:p[5]]

Now you have the data in nice numpy array! To convert into a matrix, do:

yy.reshape([480, 640])
cb.reshape([480 / 2, 640 / 2])
cr.reshape([480 / 2, 640 / 2])

Hope it helps! If not, drop me a comment...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top