Question

I need to detect MPEG4 I-Frame in RTP packet. I know how to remove RTP header and get the MPEG4 frame in it, but I can't figure out how to identify the I-Frame.

Does it have a specific signature/header?

Was it helpful?

Solution

Ok so I figured it out for h264 stream.

How to detect I-Frame:

  • remove RTP header
  • check the value of the first byte in h264 payload
  • if the value is 124 (0x7C) it is an I-Frame

I cant figure it out for the MPEG4-ES stream... any suggestions?

EDIT: H264 IDR

This works for my h264 stream (fmtp:96 packetization-mode=1; profile-level-id=420029;). You just pass byte array that represents the h264 fragment received through RTP. If you want to pass whole RTP, just correct the RTPHeaderBytes value to skip RTP header. I always get the I-Frame, because it is the only frame that can be fragmented, see here. I use this (simplified) piece of code in my server, and it works like a charm!!!! If the I-Frame (IDR) is not fragmented, the fragment_type would be 5, so this code would return true for the fragmented and not fragmented IDRs.

public static bool isH264iFrame(byte[] paket)
    {
        int RTPHeaderBytes = 0;

        int fragment_type = paket[RTPHeaderBytes + 0] & 0x1F;
        int nal_type = paket[RTPHeaderBytes + 1] & 0x1F;
        int start_bit = paket[RTPHeaderBytes + 1] & 0x80;

        if (((fragment_type == 28 || fragment_type == 29) && nal_type == 5 && start_bit == 128) || fragment_type == 5)
        {
            return true;
        }

        return false;
   }

Here's the table of NAL unit types:

 Type Name
    0 [unspecified]
    1 Coded slice
    2 Data Partition A
    3 Data Partition B
    4 Data Partition C
    5 IDR (Instantaneous Decoding Refresh) Picture
    6 SEI (Supplemental Enhancement Information)
    7 SPS (Sequence Parameter Set)
    8 PPS (Picture Parameter Set)
    9 Access Unit Delimiter
   10 EoS (End of Sequence)
   11 EoS (End of Stream)
   12 Filter Data
13-23 [extended]
24-31 [unspecified] 

EDIT 2: MPEG4 I-VOP

I forgot to update this... Thanx to Che and ISO IEC 14496-2 document, I managed to work this out! Che was rite, but not so precise in his answer... so here is how to find I, P and B frames (I-VOP, P-VOP, B-VOP) in short:

  1. VOP (Video Object Plane -- frame) starts with a code 000001B6(hex). It is the same for all MPEG4 frames (I,P,B)
  2. Next follows many more info, that I am not going to describe here (see the IEC doc), but we only (as che said) need the higher 2 bits from the following byte (next two bits after the byte with the value B6). Those 2 bits tell you the VOP_CODING_TYPE, see the table:

    VOP_CODING_TYPE (binary)  Coding method
                          00  intra-coded (I)
                          01  predictive-coded (P)
                          10  bidirectionally-predictive-coded (B)
                          11  sprite (S)
    

So, to find I-Frame find the packet starting with four bytes 000001B6 and having the higher two bits of the next byte 00. This will find I frame in MPEG4 stream with a simple video object type (not sure for advanced simple).

For any other problems, you can check the document provided (ISO IEC 14496-2), there is all you want to know about MPEG4. :)

OTHER TIPS

As far as I know, MPEG4-ES stream fragments in RTP payload usually start with MPEG4 startcode, which can be one of these:

  • 0x000001b0: visual_object_sequence_start_code (probably keyframe)
  • 0x000001b6: vop_start_code (keyframe, if the next two bits are zero)
  • 0x000001b3: group_of_vop_start_code, which contains three bytes and then hopefully a vop_start_code that may or may not belong to a keyframe (see above)
  • 0x00000120: video_object_layer_start_code (probably keyframe)
  • 0x00000100-0x0000011f: video_object_start_code (those look like keyframes as well)
  • something else (probably not a keyframe)

I'm afraid that you'll need to parse the stream to be sure :-/

Actually, you was correct for h264 stream, if the NAL value (first byte) is 0x7C it means that the I-Frame is fragmented. No other frames (P and B) can be fragmented, so if there is packetization-mode=1 in SDP, then it means that the I-Frames are fragmented, and therefore if you read 0x7C as first byte, then it is I-Frame. Read more here: http://www.rfc-editor.org/rfc/rfc3984.txt.

This worked for me:
- Figure out the "payload type", for example: Payload type: DynamicRTP-Type-96 (96)
- Tell wireshark which stream is H264: File->preferences->protocols->H264. Enter 96 as payload type.
- Filter on slice_type:"h264.slice_type eq 7"

0x000001b6: vop_start_code (keyframe, if the next two bits are zero) this is correct way for MPEG-4

For H264:

  1. Remove RTP header.
  2. If chunk NAL type (in first byte) is SPS (7) or PPS (8) mark the frame as IFrame (many cameras not use SPS and PPS (Axis included)).
  3. If chunk NAL type is #28 FU A (fragmentation unit A), check FU Header (next byte) if is NAL type IDR (5) (IDR (Instantaneous Decoding Refresh) Picture) is an IFrame.

Examples:

nal_ref_idc: 3, nal type: 7 (0x07) descripcion: 7 (SPS)<br>
00000000  24 00 00 2B 80 60 22 ED 96 57 3E 68 57 F3 22 B5  $..+.`"í.W>hWó"µ<br>
00000010  67 64 00 1E AD 84 01 0C 20 08 61 00 43 08 02 18  gd..­... .a.C...

00000020  40 10 C2 00 84 2B 50 5A 09 34 DC 04 04 04 08  @.Â..+PZ.4Ü....<br>
nal_ref_idc: 3, nal type: 8 (0x08) descripcion: 8 (PPS)<br>
00000000  24 00 00 10 80 60 22 EE 96 57 3E 68 57 F3 22 B5  $....`"î.W>hWó"µ
00000010  68 EE 3C B0  hî<°

FU_A (fragmentation unit A) 
nal_ref_idc: 3, nal type: 5 (0x05) descripcion: 5 (IDR (Instantaneous Decoding Refresh) Picture)
00000000  24 00 05 96 80 60 22 F1 96 57 3E 68 57 F3 22 B5  $....`"ñ.W>hWó"µ
00000010  7C 05 A0 AA 2F 81 92 AB CA FE 9E 34 D8 06 AD 74  |. ª/..«Êþ.4Ø.­t
...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top