I have seen you ask a lot of question about video lately. When it comes to video; There is no such thing as a "simple way".
Here is the AVC answer. Fetching the dimensions of a H264Video stream
First off, the fps, is usually not encoded in the ES. That is a container level thing. There is a place for in the AVC/HEVC, but you should not count on it being filled in.
You have already found start code. For the record a start code is 2 or more 00 followed but a 01 (it is not always 00 00 00 01, it can be 00 00 01, 00 00 00 00 01, etc.) If this sequence ever shows up in a NALU naturally, it must be escaped. So, when encoding a 00 00 01 within a NALU payload will be changed to 00 00 03 01. I mention this, because if you encounter this sequence while parsing, you must ignore the presence of the emulation prevention
byte (03). The ES with the emulation prevention
bytes removed is called the Raw Byte Sequence Payload (rbsp).
The first two bytes after the start code has the nal_unit_header. the type is stored bits 1-6 ( ( NALU[0] >> 1 ) & 0x3F )
. The resolution is stores in the SPS_NUT, which is 33. Once you have the SPS, you can parse it.
This brings us to exponential golomb encoding. basically a exponential golomb encoded
integer (abbreviated ue
of unsigned, or se
for signed) is a variable length integer. To decode you count leading zeros. This count will tell you the length of the integer in bits. Next you read that many bits and subtract 1 to get your integer. The signed variant is the same, except the last bit is the sign bit for example:
1 = 0 = 0
010 = 1 = 1
011 = 2 = -1
00100 = 3 = 2
00101 = 4 = -2
Next parse the sps using the table in 7.3.2.2 of the Rec. ITU-T H.265 (04/2013) (page 33)
And finely you can calculate the detentions from pic_width_in_luma_samples pic_height_in_luma_samples and conformance_window paramaters.