Question

I want to understand how video and audio decoding works, specially the timing synchronization (how to get 30fps video, how to couple that with audio, etc.). I don't want to know ALL the details, just the essence of it. I want to be able to write a high level simplification of an actual video/audio decoder.

Could you provide pointers to me? An actual C/C++ source code of a MPEG2 video/audio decoder would be the fastest way to understand those things I think.

Was it helpful?

Solution

You can browse source code of ffmpeg (available through svn), or its API documentation.

OTHER TIPS

Reading source code from a codec that works seems the right way to go. I suggest the following :

http://www.mpeg.org/MPEG/video/mssg-free-mpeg-software.html

Given that it's mentionned on the mpeg.org website, i'd say you'll find what you need here.

In the past i've had some time to work on decoding mpeg videos (no audio though), and the principles are quite simple. There are some pure images included, some intermediary images that are described relatively to the closest main ones, and the rest are described using the closest main/intermediary images.

One time slot, one image. But recent codecs are much more complicated, I guess !

EDIT : synchronization

I am no expert in synchronizing audio and video, but the issue seems to be dealt with using a sync layer (see there for a definition).

For audio/video synchronization, basically, every video and audio frame should be time-stamped. The timestamp is typically known as PTS (Presentation Time Stamp). Once a video/audio is decoder by decoder, the audio/video renderer should schedule the frame to be displayed at the right time so that audio/video is synchronized.

I think you can refer to chapter "Timing Model" of MPEG2 Tutorial for details.

Depending on how much you know about MPEG-2 format, you might want to get a broad overview by reading an article about it first. I mean something like these:

A Beginners Guide for MPEG-2 Standard

MPEG-2 VIDEO COMPRESSION

@ Patric and Nils

So you say that there are timestamps, hein... These are for the video part only I guess. For audio I guess there is enough information in the header (like "samples per second"). How often these time stamps are needed? I imagine that interlacing of audio and video packets ensures that video data is always ahead of audio data or something?

EDIT: Found what I needed: http://www.dranger.com/ffmpeg/tutorial01.html

Helltone,

Timestamps for audio data are still necessary because the audio and video frame may not be aligned at the same place. For example:

V: 1000 1040 1080 1120 ... A: 990 1013 1036 (lost) 1082

You may need to compensate the offset between the first video/audio frame. Besides, if it is possible that there are packet loss (during video streaming), you need the timestamps of both video/audio to keep accurate synchronization.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top