MPEG-4 Part 2 had some awesome face- and body- motion concepts, but they disappeared in MPEG-4 Part 10 (H.264). Why?

https://stackoverflow.com/questions/9800015

25-05-2021
|

Question

During the last few weeks, I had the opportunity to read two documents:

The MPEG-4 Part 2 specification (ISO/IEC 14496-2), which people just call "mpeg-4"
The MPEG-4 Part 10 specification (ISO/IEC 14496-10), which is also called "h.264" or "AVC"

After having read all the cool ideas in "mpeg-4" like identifying facial expression, motion of limbs of people, and sprites, I got really excited. The ideas sound very fun, maybe even fantastic, for an idea from 1999.

But then I read the "h.264" standard, and none of those ideas were there. There was a lot of discussion on how to encode pixels, but none of the really cool ideas.

What happened? Why were these ideas removed?

This is not a code question, but as a programmer I feel I should attempt to understand as much of the intent behind a specification. If the code I write adheres to the spirit in which the specification was meant to be used, it's more likely to be positioned to take advantage of the entire specification.

Solution

You seem to be making the assumption that the MPEG-4 Part 10 specification improves on MPEG-4 Part 2, while the fact is that these two specifications are unrelated, have nothing in common and were even developed by different people (MPEG developed the Part 2 specification, while ITU-T, ISO, IEC and MPEG together developed the Part 10 specification).

Keep in mind that ISO/IEC 14496 standard is a collection of specifications that apply to different aspects of audiovisual encoding. The goal of the Part 2 specification is to encode different kinds of visual objects (video, 3D objects, etc.). The goal of Part 10 is to provide a very efficient and high quality encoding for video. Other parts of the standard deal with other aspects, for example the Part 3 specification deals with audio encoding, and Parts 12 and 15 define a container file format that is most typically used to wrap Part 10 video (i.e. H.264) and Part 3 audio (i.e. AAC) into a single file, the so called .mp4 format.

I hope this helps!

OTHER TIPS

A little bit of history might help.

MPEG-4 was designed as a carrier/container specification for different types of media related data communication. To be compliant a device only had to recognize and ignore the content.

This was a reaction to the short life time of the MPEG-1 specs, which were obsolete before they were formalized.

The MPEG-4 can be divided into

mechanisms to transport image generating data

These included the obvious things like

compression
motion compensation and explicit sprites

The experimental such as

Transporting and reconstructing 3D and 3D + time data from an image stream (video) to provide compression and feature expansion.

Rate Adaption Mechanisms

In 1999 there was a huge range of relevant bit rates from 128K dial up to 1000 Mbit L/M/WANs and the spec had many special cases and efforts to provide interoperability.

This produced much committee work which became redundant as the network performance range narrowed to minimums/maximums of 1Mbit to 100Mbit.

Initially every spec under the sun and some still in the creators mind was attached to the MPEG-4 framework except for the competing specs such as H.264.

Some of the specs faded out of existence as money dried up in the dot.com collapse and H.264 and others merged into MPEG4.

One thing I learned from this was reading a spec without at least an example implementation while often interesting was rarely productive.

I guess "use the source Luke" could apply

"Specs taste bad without source".

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow