Decode MP3 into PCM using JLayer to detect amplitude

https://stackoverflow.com/questions/18938622

29-06-2022
|

Question

Background: I am using JLayer to play an MP3 file. I am attempting to analyze the varying amplitude/audio levels in the MP3. With my analysis, I would like to determine the duration of the silence at the beginning and end of the MP3. In addition, as the MP3 is being played, I would like a graph to display the audio level (like a visual soundwave).

Problem: For effective analysis, I need to be able to analyze raw PCM data. Currently, I am analyzing the byte[] retrieved through AudioInputStream and sent to SourceDataLine. PCM is short[] not byte[], which means I am not getting the full data.

I am using Root-Mean Square (RMS) to determine volume levels.

The playback code where the byte[] is processed:

AudioInputStream in = null;
AudioFile af = null; //Custom class which holds some data about mp3.
SourceDataLine line = null;

// Set current audio file.
af = musicPlaylist.get(0);

line = (SourceDataLine) AudioSystem.getLine(af.getLineInfo());
line.open(af.getAudioFormat());
line.start();

in = getAudioInputStream(af.getAudioFormat(), af.getAudioStream());

int bR = playbackBufferSize;

final byte[] buffer = new byte[bR];
int n = 0;
while (playMedia) {
    if ((n = in.read(buffer, 0, buffer.length)) == -1) {
        break;
    }

    if (line != null) {
        line.write(buffer, 0, n);

        int amp = (int) Math
                .ceil((rmsAudioLevel(decode(buffer)) / 32767) * 100);
        mainScreen.setAmpDisplayLevel(amp, String.valueOf(amp));
        mainScreen.updateGraph(amp);
    }
}

Essentially: How do I decode the PCM data on-the-spot as I play the MP3, so that I may show volume levels and therefore detect silence?

Solution

First off, you ARE getting all the PCM data in buffer[]. But you probably have to assemble the bytes into PCM data. Your audio format will tell you how many bits encoding is being used. Most common is 16-bit, but sometimes 24- or 32-bit data shows up. With 16-bit data, you append two contiguous bytes to build a short. The order of the two bytes depends on whether the format is little-endian or big-endian. I am noticing on the right of this screen, in the "Related" column, is a link: how to get PCM data from a wav file--that link or another similar should get you an example of the code you will need.

Second issue, I don't think doing RMS on separate buffer[] arrays is exactly correct. I could be wrong on this. I'm thinking its more like a moving average, where some of the data from the beginning of one buffer[] should include some of the data from the end of the previous buffer[]. Does the formula require that you "go back" or "average over" N number of frames? If so, you will want to keep the previous buffer[] handy for situations where the N amount spans two frames. And you will be iterating through the current buffer[], one "frame" at a time (or handing buffer[] to a subroutine that in effect does this).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow