Best way to handle cue points in an audio file (from microphone)

https://stackoverflow.com/questions/5487595

14-11-2019
|

Question

I'd like to record audio from a microphone and I'd like the user to be able to add cue points during the recording so when he loads the file afterwards, he can jump easily to a specific position.

I don't really know what file format I should use and how to store the metadata (cue points)

I thought about

encoding a wav/mp3 and stores metadata in a .cue file (easy to do but 2 files to store)
storing XMP data in a mp3 file (seems a bit harder but standard metadata embedded in 1 file)
flv/f4v with audio only and standard cue points but I'm not sure I can create them directly from the client

What do you think?

Solution

Did you know that the WAV file format supports embedded CUE chunks?

The spec says:

The <cue-ck> cue-points chunk identifies a series of positions in 
the waveform data stream. The <cue-ck> is defined as follows:
<cue-ck> -> cue( <dwCuePoints:DWORD> // Count of cue points
                 <cue-point>... ) // Cue-point table

<cue-point> -> struct {
    DWORD dwName;
    DWORD dwPosition;
    FOURCC fccChunk;
    DWORD dwChunkStart;
    DWORD dwBlockStart;
    DWORD dwSampleOffset;
}

For a WAV file with a single data chunk (the standard), fccChunk should be "data", chunkStart and blockStart are 0, and SampleOffset is the sample you want to mark.

This might be the most portable way to record your cues.

OTHER TIPS

In my opinion the best way would be to save the SoundChannel.position of the user cue points into an XML or something similar. I realize this will output two files but you could also make up your own proprietary file type and write the audio data + cue point NFO to a single file. Anyway I say this for several reasons. The first being, finding good MP3 encoders for flash is going to be a bit tricky. Most of them are based on alchemy, which is a preview technology (not even beta just plain slapped together preview based tech). It's also very prone to memory leaks, sluggish performance (as flash coders trying to code C/C++ generally mismanage or overload processes) and it's also hard to find a fully-feature complete one. I'd say the same most likely applies for FLV/F4V encoding on the flash side too. If you want to go this route, this is about the best thing you've got going out there (as a start): http://www.zeropointnine.com/blog/simpleflvwriteras-as3-class-to-create-flvs/

So really if it comes down to what is going to have your project complete the fastest and running the most stable, go with the 2 file solution. Don't get me wrong if you wanted to, there are plenty of half-completed or basic MP3 encoding solutions out there (as well as FLV, as referenced above) but it's going to take invested time and effort to learn these file formats, learn how these existing classes/frameworks work, learning to extend them and then stabilize and debug them. If you want to invest that time then by all means dive right in: http://code.google.com/p/flash-kikko/.

One option is to use Base64 encoding to save the audio data as text inside an XML file along with the cuepoint data. This would be a really inefficient way to store the data and would produce a big file, especially if it was a long recording. However, using an MP3 encoder/decoder or some other encoder could help with that.

By the way, is the the Florian F. that I used to work with?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow