Question

I'm making some progress on taking a compressed (mp3) sound and saving it as PCM. In addition, I wanted to split the original file into chunks that are 2 seconds long, within the same process. I seem to be successful, but I am a little confused as to why.

As I read blocks of audio and write the files out, I check to see if I am about to write a chunk that would make my file exceed my 2 second limit. If so, I write enough to get to 2 seconds, close the file, and then open a new file and write the remainder into the new file, and then read more data. Something like this:

framesInTimedSegment += numFrames;
if ((framesInTimedSegment  > (2.0 * sampleRate)) && (j < 5)) {
    UInt32 newNumFrames = numFrames;
    numFrames = framesInTimedSegment - (2.0 * sampleRate);
    newNumFrames -= numFrames;
// Question A
    UInt32 segmentOffset = newNumFrames * numChannels * 2;
    error = ExtAudioFileWrite(segmentFile, newNumFrames, &fillBufList);
// Question B
       // handle this error!  We might have an interruption
    if (segmentFile) ExtAudioFileDispose(segmentFile);
    XThrowIfError(ExtAudioFileCreateWithURL(urlArray[++j], kAudioFileCAFType, &dstFormat, NULL, kAudioFileFlags_EraseFile, &breakoutFile), "ExtAudioFileCreateWithURL failed! - segmentFile");
    size = sizeof(clientFormat);
    XThrowIfError(ExtAudioFileSetProperty(segmentFile, kExtAudioFileProperty_ClientDataFormat, size, &clientFormat), "couldn't set destination client format"); 
    fillBufList.mBuffers[0].mData = srcBuffer + segmentOffset;
    fillBufList.mBuffers[0].mDataByteSize = numFrames * fillBufList.mBuffers[0].mNumberChannels * 2;
    framesInTimedSegment = numFrames;
}
error = ExtAudioFileWrite(segmentFile, numFrames, &fillBufList);

Here are my questions (I have tried to label the relevant line):

A: Is there a better way to find the offset into my buffer so I don't erroneously hard code some value in there? For example, is there a blessed way to get the data offset from frame number?

B: If ExtAudioFileWrite is doing the conversion from compressed to decompressed, then the data I am writing hasn't yet been decompressed (right?), so shouldn't I have to worry about playing with frame numbers and offsets when I am dealing with compressed data? Should I instead be converting the file first, either to a PCM file or into memory, and then split that PCM?

Thanks!

-mahboud

ps.

The clientFormat is defined as follows:

        clientFormat = dstFormat;

and dstFormat:

        dstFormat.mFormatID = outputFormat;
        dstFormat.mChannelsPerFrame = srcFormat.NumberChannels();
        dstFormat.mBitsPerChannel = 16;
        dstFormat.mBytesPerPacket = dstFormat.mBytesPerFrame = 2 * dstFormat.mChannelsPerFrame;
        dstFormat.mFramesPerPacket = 1;
        dstFormat.mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger; // little-endian
Was it helpful?

Solution

It's difficult to answer correctly without seeing a bit more code. But, assuming clientFormat is an interleaved PCM format:

B) ExtAudioFileWrite does not perform the conversion from compressed to decompressed, ExtAudioFileRead does- depending on what client format you have set. Assuming an MP3 source file and a "standard" 16-bit 44.1 KHz PCM client format, calls to ExtAudioFileRead will convert from the MP3 bytes to PCM data. This is done under the hood by using AudioFile and AudioConverter APIs.

A) This is a bit hard to answer without seeing how srcBuffer is defined (I assume an array of int16_t). If you are working with PCM data, what you are doing looks OK. You could also use newNumFrames * clientFormat.mBytesPerFrame * clientFormat.mChannelsPerFrame, but assuming 16-bit PCM data, mBytesPerFrame == mBytesPerPacket == 2. If you were working with non-CBR data you would need to concern yourself with packet descriptions, but that doesn't seem to be the case.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top