質問

I have a series of MP4 files (H.264 video, AAC audio, 16KHz). I need to merge them together programmatically (Objective-C, iOS) but the final file will be too large to hold in memory so I can't use the AVFramework to do this for me.

I have written code which will do the merge and takes care of all of the MP4 atoms (STBL, STSZ, STCO etc.) based on just concatenating the contents of the respective MDATS. The problem I have is that while the resultant file plays, the audio gradually gets out of sync with the video. What seems to be happening is that there is a disparity between the audio and video length in each file which gets worse the more files I concatenate.

I've used MP4Box to generate a file from command line and it is 'similar but different' to my output. A notable different is that the length of the MDAT has changed and the chunk offsets have also changed (though sample sizes remain consistent).

I've recently read that AAC encoding introduces padding at the beginning and end of a stream so wonder if this is something I need to handle.

Q: Given two MDAT atoms containing H264 encoded data and AAC audio, is my basic method sound or do I need to introspect the MDAT data in some way.

役に立ちましたか?

解決

Thanks for pointer Niels

So it seems that the approach is perfectly reasonable however each individual MP4 file has marginal differences between the audio length and video length due to differences between the sampling frequency. The MP4s include an EDTS.ELST combination which correct this issue for that file. I was failing to consider the EDTS when I merged files. Merging EDTS has fixed the issue.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top