Question

I have created an elementary bin stream using HM-12.0 reference code. So the out put is an HEVC encoded bin stream (say input.bin).

I have a task which involves reading the header of this elementary stream. That is i need to fetch information such a the stream width, height etc. from the input.bin file.

After seeing a lots of streams, i can conclude that all these bin streams starts from the sequence:

00 00 00 01

So whenever i see this sequence in any bin stream, i can say that this stream has to be decoded by HEVC decoder.

Further if i want to fetch the width, height, fps etc. from the input.bin (like ff_raw_video_read_header function in ffmpeg), that are the steps need to be performed to fetch this information?

I have gone through the parsing section of the HEVC draft, but its very complicated for my level in video domain. Can anyone suggest a simple way to fetch the required information from the encoded bin file?

Any suggestions will be really helpful to me. Thanks in advance.

Was it helpful?

Solution

I have seen you ask a lot of question about video lately. When it comes to video; There is no such thing as a "simple way".

Here is the AVC answer. Fetching the dimensions of a H264Video stream

First off, the fps, is usually not encoded in the ES. That is a container level thing. There is a place for in the AVC/HEVC, but you should not count on it being filled in.

You have already found start code. For the record a start code is 2 or more 00 followed but a 01 (it is not always 00 00 00 01, it can be 00 00 01, 00 00 00 00 01, etc.) If this sequence ever shows up in a NALU naturally, it must be escaped. So, when encoding a 00 00 01 within a NALU payload will be changed to 00 00 03 01. I mention this, because if you encounter this sequence while parsing, you must ignore the presence of the emulation prevention byte (03). The ES with the emulation prevention bytes removed is called the Raw Byte Sequence Payload (rbsp).

The first two bytes after the start code has the nal_unit_header. the type is stored bits 1-6 ( ( NALU[0] >> 1 ) & 0x3F ). The resolution is stores in the SPS_NUT, which is 33. Once you have the SPS, you can parse it.

This brings us to exponential golomb encoding. basically a exponential golomb encoded integer (abbreviated ue of unsigned, or se for signed) is a variable length integer. To decode you count leading zeros. This count will tell you the length of the integer in bits. Next you read that many bits and subtract 1 to get your integer. The signed variant is the same, except the last bit is the sign bit for example:

1     = 0 =  0
010   = 1 =  1
011   = 2 = -1
00100 = 3 =  2
00101 = 4 = -2

Next parse the sps using the table in 7.3.2.2 of the Rec. ITU-T H.265 (04/2013) (page 33)

And finely you can calculate the detentions from pic_width_in_luma_samples pic_height_in_luma_samples and conformance_window paramaters.

OTHER TIPS

szatmary nailed it. I wanted to clarify how to calculate the frame size (at least the width, read the document for the height. Just do it) from the SPS as it can be confusing (unless you actually spend the time to read the whole document, just do it).

if (chroma_format_idc == 1) 
    SubWidthC = 2
else 
    SubWidthC = 1

if(conformance_window_flag)
    width-to-display = pic_width_in_luma_samples - ((conf_win_left_offset + conf_win_right_offset) * SubWidthC);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top