Question

I intend to encode YUV data to H264 format on android platform. I've got it all implemented however have one small query to ask here regarding the DSI data getting returned with dequeOutputBuffer() call.

Currently, for the 1st call to dequeOutputBuffer(), I get the DSI data back. So for the 1st YUV-frame-input to the ViceoEncoder, I'm calling dequeOutputBuffer() twice to get the encoded stream. For remaining frames, I call dequeOutputBuffer() only once to get the corresponding encoded data. This approach works fine on devices running on ARM arch however on device running on x86 arch, it hangs during dequeOutputBuffer(), while encoding first YUV-frame.

So, my questions are:

  1. Am I missing something w.r.t. Encoder configuration?
  2. Is there a way to get back combined stream of DSI + EncodedData with single call to dequeOutputBuffer()?

Hope the question is clear.

Was it helpful?

Solution

The video encoder is going to accept N frames before producing any output. In some cases N will be 1, and you will see an output frame shortly after providing a single input frame. Other codecs will want to gather up a fair bit of video data before starting to produce output. It appears you've managed to resolve your current situation by doubling-up frames and discarding half the output, but you should be aware that different devices and different codecs will behave differently (assuming portability is a concern).

The CSD data is provided in a buffer with the BUFFER_FLAG_CODEC_CONFIG flag set. There is no documented behavior in MediaCodec for if or when such buffers will appear. (In fact, if you're using VP8, it doesn't appear at all.) For AVC, it arrives in the first buffer. If you're not interested in the CSD data, just ignore any packet with that flag set.

Because the buffer info flags apply to the entire buffer of data, the API doesn't provide a way to return a single buffer that has both CSD and encoded-frame data in it.

Note also that the encoder is allowed to reorder output, so you might submit frames 0,1,2 and receive encoded data for 0,2,1. The easiest way to keep track is to supply a presentation time stamp with each frame that uniquely identifies it. Some codecs will use the PTS value to adjust the encoding quality in an attempt to meet the bit rate goal, so you need to use reasonably "real" values, not a trivial integer counter.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top