Question

I am looking for a way to create a graphical waveform of mp3 files uploaded to a server. From a little research i beleive the mp3 would need converting to a raw format first...but i have no idea on how to create a .gif format of the waveform for this sound file.

90% of the sound files would be over 60 minutes in length.

I understand this wouldn't be able to done right after it is uploaded, it would need to be placed into a queue and processed.

I have looked at several libaries such as lame, but none seem to be able to achieve what i am looking for.

Any pointers into the right direction would be greatly appreciated!

Thank you very much! Paul

Was it helpful?

Solution

You may find FMOD or BASS can do this for you, quite easily. I believe the FMOD download comes with a visualisation example, so you can extrapolate from that. They both read in MP3's and perform the necessary calculations.

OTHER TIPS

The first step is to uncompress the mp3. As long as you're doing this as a batch job, rather than use LAME as a library, just use an existing command-line program to convert the mp3 to a temporary WAV file, that will be much easier. Then find a library to read WAV files - it's a relatively simple format and you should find lots of sample code online, or you could write your own in an afternoon.

Suppose your song is 60 minutes long: 60 minutes * 60 seconds/minute * 44100 samples/second = 158,760,000 samples. (Twice that if it's a stereo song.) If your image is 1000 pixels wide, you only want to display one sample for every 158,760 samples.

(As an aside, you won't see much detail at that resolution. Perhaps a better solution would be to show a waveform of just the first 5 minutes, or render a larger image that the user can scroll?)

Anyway, you want to read the audio samples for each block of 158,760 samples (in this example), and render it as a vertical line representing the strength of the signal over that portion of the audio. There are two ways to do this:

  1. The maximum value over that region
  2. The root-mean-squared (RMS) value over that region

Maximum will show you peaks, while RMS will show you the overall perceived loudness. Both are easy to implement; try both and see which one looks best.

Then you just need to turn that resulting image into a gif. Since this is a batch job anyway, if I were you, I would write out a BMP file (a really easy file format) and then use a command-line program like ImageMagick's "convert" to turn that into a GIF.

Finally, one last note: if you're really tricky, you could read the MP3 frames and extract the gain directly from the bitstream without decoding the whole thing. That's what I did here, and you're welcome to use it - but it's not for the faint of heart. It's roughly 100x faster than decoding the full MP3, but the waveform you get will be a crude approximation.

You can use the C# code here to generate the wave form image. Afterward, you should use the common System.Drawing library to save it to a GIF file.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top