Writing video + generated audio to AVAssetWriterInput, audio stuttering

https://stackoverflow.com/questions/12187124

29-06-2021
|

Domanda

I'm generating a video from a Unity app on iOS. I'm using iVidCap, which uses AVFoundation to do this. That side is all working fine. Essentially the video is rendered by using a texture render target and passing the frames to an Obj-C plugin.

Now I need to add audio to the video. The audio is going to be sound effects that occur at specific times and maybe some background sound. The files being used are actually assets internal to the Unity app. I could potentially write these to phone storage and then generate an AVComposition, but my plan was to avoid this and composite the audio in floating point format buffers (obtaining audio from audio clips is in float format). I might be doing some on the fly audio effects later on.

After several hours I managed to get audio to be recorded and play back with the video... but it stutters.

Currently I'm just generating a square wave for the duration of each frame of video and writing it to an AVAssetWriterInput. Later, I'll generate the audio I actually want.

If I generate one massive sample, I don't get the stuttering. If I write it in blocks (which I'd much prefer over allocating a massive array), then the blocks of audio seem to clip each other:

Glitch

I can't seem to figure out why this is. I am pretty sure I am getting the timestamp for the audio buffers correct, but maybe I'm doing this whole part incorrectly. Or do I need some flags to get the video to sync to the audio? I cant see that this is the problem, since I can see the problem in a wave editor after extracting the audio data to a wav.

Relevant code for writing audio:

- (id)init {
    self = [super init];
    
    if (self) {
        // [snip]
        
        rateDenominator = 44100;
        rateMultiplier = rateDenominator / frameRate;
        
        sample_position_ = 0;
        audio_fmt_desc_ = nil;
        int nchannels = 2;
        AudioStreamBasicDescription audioFormat;
        bzero(&audioFormat, sizeof(audioFormat));
        audioFormat.mSampleRate = 44100;
        audioFormat.mFormatID   = kAudioFormatLinearPCM;
        audioFormat.mFramesPerPacket = 1;
        audioFormat.mChannelsPerFrame = nchannels;
        int bytes_per_sample = sizeof(float);
        audioFormat.mFormatFlags = kAudioFormatFlagIsFloat | kAudioFormatFlagIsAlignedHigh;
        audioFormat.mBitsPerChannel = bytes_per_sample * 8;
        audioFormat.mBytesPerPacket = bytes_per_sample * nchannels;
        audioFormat.mBytesPerFrame = bytes_per_sample * nchannels;
        
        CMAudioFormatDescriptionCreate(kCFAllocatorDefault,
                                       &audioFormat,
                                       0,
                                       NULL,
                                       0,
                                       NULL,
                                       NULL,
                                       &audio_fmt_desc_
        );
    }
    
    return self;
}

- (BOOL)beginRecordingSession {
    NSError* error = nil;
    
    isAborted = false;
    abortCode = No_Abort;
    
    // Allocate the video writer object.
    videoWriter = [[AVAssetWriter alloc] initWithURL:[self getVideoFileURLAndRemoveExisting:
                   recordingPath] fileType:AVFileTypeMPEG4 error:&error];
    
    if (error) {
        NSLog(@"Start recording error: %@", error);
    }
    
    // Configure video compression settings.
    NSDictionary* videoCompressionProps = [NSDictionary dictionaryWithObjectsAndKeys:
                                           [NSNumber numberWithDouble:1024.0 * 1024.0], AVVideoAverageBitRateKey,
                                           [NSNumber numberWithInt:10],AVVideoMaxKeyFrameIntervalKey,
                                           nil];
    
    // Configure video settings.
    NSDictionary* videoSettings = [NSDictionary dictionaryWithObjectsAndKeys:
    AVVideoCodecH264, AVVideoCodecKey,
    [NSNumber numberWithInt:frameSize.width], AVVideoWidthKey,
    [NSNumber numberWithInt:frameSize.height], AVVideoHeightKey,
    videoCompressionProps, AVVideoCompressionPropertiesKey,
    nil];
    
    // Create the video writer that is used to append video frames to the output video
    // stream being written by videoWriter.
    videoWriterInput = [[AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeVideo outputSettings:videoSettings] retain];
    //NSParameterAssert(videoWriterInput);
    videoWriterInput.expectsMediaDataInRealTime = YES;
    
    // Configure settings for the pixel buffer adaptor.
    NSDictionary* bufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys:
    [NSNumber numberWithInt:kCVPixelFormatType_32ARGB], kCVPixelBufferPixelFormatTypeKey, nil];
    
    // Create the pixel buffer adaptor, used to convert the incoming video frames and
    // append them to videoWriterInput.
    avAdaptor = [[AVAssetWriterInputPixelBufferAdaptor assetWriterInputPixelBufferAdaptorWithAssetWriterInput:videoWriterInput sourcePixelBufferAttributes:bufferAttributes] retain];
    
    [videoWriter addInput:videoWriterInput];
    
    // <pb> Added audio input.
    sample_position_ = 0;
    AudioChannelLayout acl;
    bzero( &acl, sizeof(acl));
    acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
    
    NSDictionary* audioOutputSettings = nil;
    
    audioOutputSettings = [NSDictionary dictionaryWithObjectsAndKeys:
    [ NSNumber numberWithInt: kAudioFormatMPEG4AAC ], AVFormatIDKey,
    [ NSNumber numberWithInt: 2 ], AVNumberOfChannelsKey,
    [ NSNumber numberWithFloat: 44100.0 ], AVSampleRateKey,
    [ NSNumber numberWithInt: 64000 ], AVEncoderBitRateKey,
    [ NSData dataWithBytes: &acl length: sizeof( acl ) ], AVChannelLayoutKey,
    nil];
    
    audioWriterInput = [[AVAssetWriterInput
    assetWriterInputWithMediaType: AVMediaTypeAudio
    outputSettings: audioOutputSettings ] retain];
    
    //audioWriterInput.expectsMediaDataInRealTime = YES;
    audioWriterInput.expectsMediaDataInRealTime = NO; // seems to work slightly better
    
    [videoWriter addInput:audioWriterInput];
    
    rateDenominator = 44100;
    rateMultiplier = rateDenominator / frameRate;
    
    // Add our video input stream source to the video writer and start it.
    [videoWriter startWriting];
    [videoWriter startSessionAtSourceTime:CMTimeMake(0, rateDenominator)];
    
    isRecording = true;
    return YES;
}

- (int) writeAudioBuffer:(float *)samples sampleCount:(size_t)n channelCount:(size_t)nchans {
    if (![self waitForAudioWriterReadiness]) {
        NSLog(@"WARNING: writeAudioBuffer dropped frame after wait limit reached.");
        return 0;
    }
    
    //NSLog(@"writeAudioBuffer");
    OSStatus status;
    CMBlockBufferRef bbuf = NULL;
    CMSampleBufferRef sbuf = NULL;
    
    size_t buflen = n * nchans * sizeof(float);
    // Create sample buffer for adding to the audio input.
    status = CMBlockBufferCreateWithMemoryBlock(
        kCFAllocatorDefault,
        samples,
        buflen,
        kCFAllocatorNull,
        NULL,
        0,
        buflen,
        0,
        &bbuf);
    
    if (status != noErr) {
        NSLog(@"CMBlockBufferCreateWithMemoryBlock error");
        return -1;
    }
    
    CMTime timestamp = CMTimeMake(sample_position_, 44100);
    sample_position_ += n;
    
    status = CMAudioSampleBufferCreateWithPacketDescriptions(kCFAllocatorDefault, bbuf, TRUE, 0, NULL, audio_fmt_desc_, 1, timestamp, NULL, &sbuf);
    if (status != noErr) {
        NSLog(@"CMSampleBufferCreate error");
        return -1;
    }
    BOOL r = [audioWriterInput appendSampleBuffer:sbuf];
    if (!r) {
        NSLog(@"appendSampleBuffer error");
    }
    CFRelease(bbuf);
    CFRelease(sbuf);
    
    return 0;
}

Any ideas on what's going on?

Should I be creating/appending samples in a different way?

Is it something to do with the AAC compression? It doesn't work if I try to use uncompressed audio (it throws).

As far as I can tell, I'm calculating the PTS correctly. Why is this even required for the audio channel? Shouldn't the video be synced to the audio clock?

UPDATE

I've tried providing the audio in fixed blocks of 1024 samples, since this is the size of the DCT used by the AAC compressor. Doesn't make any difference.

I've tried pushing all the blocks in one go before writing any video. Doesn't work.

I've tried using CMSampleBufferCreate for the remaining blocks and CMAudioSampleBufferCreateWithPacketDescriptions for the first block only. No change.

And I've tried combinations of these. Still not right.

SOLUTION

It appears that:

audioWriterInput.expectsMediaDataInRealTime = YES;

is essential otherwise it messes with its mind. Perhaps this is because the video was set up with this flag. Additionally, CMBlockBufferCreateWithMemoryBlock does NOT copy sample data, even if you pass the flag kCMBlockBufferAlwaysCopyDataFlag to it.

So, a buffer can be created with this and then copied using CMBlockBufferCreateContiguous to ensure that it you get a block buffer with a copy of the audio data. Otherwise it will reference the memory you passed in originally and things will get messed up.

Soluzione

It looks ok, although I would use CMBlockBufferCreateWithMemoryBlock because it copies the samples. Is your code ok with not knowing when audioWriterInput has finished with them?

Shouldn't kAudioFormatFlagIsAlignedHigh be kAudioFormatFlagIsPacked?

Altri suggerimenti

CMAudioSampleBufferCreateWithPacketDescriptions(kCFAllocatorDefault, bbuf, TRUE, 0, NULL, audio_fmt_desc_, 1, timestamp, NULL, &sbuf);

should be

CMAudioSampleBufferCreateWithPacketDescriptions(kCFAllocatorDefault, bbuf, TRUE, 0, NULL, audio_fmt_desc_, n, timestamp, NULL, &sbuf);i made it.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow