h264 RTP timestamp

https://stackoverflow.com/questions/2439096

19-09-2019
|

Question

I have a confusion about the timestamp of h264 RTP packet. I know the wall clock rate of video is 90KHz which I defined in the SIP SDP. The frame rate of my encoder is not exactly 30 FPS, it is variable. It varies from 15 FPS to 30 FPS on the fly. So, I cannot use any fixed timestamp.

Could any one tell me the timestamp of the following encoded packet.
After 0 milisecond encoded RTP timestamp = 0 (Let the starting timestamp 0)
After 50 milisecond encoded RTP timestamp = ?
After 40 milisecond encoded RTP timestamp = ?
After 33 milisecond encoded RTP timestamp = ?

What is the formula when the encoded frame rate is variable?

Thank you in advance.

No correct solution

OTHER TIPS

It doesn't matter if your encoder encodes video at 10FPS or 30FPS, with RTP timestamp you tell the receiver how long is the pause between the two frames. So you determine that on the fly for each frame. That way you can send 10 frames in one second (10fps), and in other second you can send 30 frames (30 fps). You only need to set the RTP timestamp correctly. And if I get your question, you are in doubt how to do this...

Let the starting time stamp be 0, you add the wall clock time in milliseconds multiplied by 100 to the last RTP timestamp, or you can use any time scale you want. To make the decoder decode 10fps video at 30fps, add 333000 to RTP timestamp for each packet... but lets look at your example:

Frame #      RTP Time   Time between frames [ms]
[  1]               0   0
[  2]           50000   50
[  3]           90000   40
[  4]          420000   33

So if you set RTP timestamp like this (Time in ms * 100000) you will make the decoder load and decode Frame 1, and then load and decode Frame 2, but it will sleep for 50 ms (time difference between Frame 1 and Frame 2) before it draws the Frame 2, and so on...

And as you can see, the decoder uses RTP timestamps to know when to display each one, and it doesnt mind if the video was encoded at 30 or 10 fps.

Also, if the video is 30 fps, that doesnt mean that for each second there will be 30 RTP packets. Sometimes there can be more then 100, so you can not have a formula that ensures the correct RTP timestamp calculation.

I guess that this is what you need... hope I helped, dont -1 me if I didnt... =)

There is no simple formula for this.

The instant used for sampling the frame before encoding is called the PTS (presentation timestamp). It's out of the scope of the encoder, you must remember it in your data flow when you capture the frames.

From there, you have 2 possibilities:

The H264 encoder does not generate B-frame, then the RTP timestamp should be the PTS + random offset (the same for all streaming session)
If the encoder generate B-frames (or B-slices), then the decoding order needs to be modified, since B-frame requires the next frame to be decoded, so it must be sent before.

In the latter case, the RFC6184 states that you have multiple way to stream the encoded NAL units.

Most of the streaming software will use the mode called "Non interleaved", in which, you must set the RTP timestamp to the PTS + offset, but send them in the decoding order so the timestamp will not increase monotonically. This also means the client will have to decode in the order received and not reorder the frames in the PTS order.

I'm not using the term DTS here for a reason, because you don't need the decoding timestamp for this to work, only the order.

The last mode described in RFC6184 is the so-called interleaved order where you can reorder the NAL units. In that case, you have to implement some application logic to reorder the units, refer to RFC6184 for details.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow