Question

I have inherited a code snippet which draws audio waveform of a given file. But this waveform is a simple image built using JAVA vector graphics without any labeling, Axes information etc. I would like to port it to the jfreechart to increase it's informative value. My problem is that the code is cryptic to say the least.

public class Plotter {
AudioInputStream audioInputStream;
Vector<Line2D.Double> lines = new Vector<Line2D.Double>();
String errStr;
Capture capture = new Capture();
double duration, seconds;
//File file;
String fileName = "out.png";
SamplingGraph samplingGraph;
String waveformFilename;
Color imageBackgroundColor = new Color(20,20,20);

public Plotter(URL url, String waveformFilename) throws Exception {
    if (url != null) {
        try {
            errStr = null;
            this.fileName = waveformFilename;
            audioInputStream = AudioSystem.getAudioInputStream(url);
            long milliseconds = (long)((audioInputStream.getFrameLength() * 1000) / audioInputStream.getFormat().getFrameRate());
            duration = milliseconds / 1000.0;
            samplingGraph = new SamplingGraph();
            samplingGraph.createWaveForm(null);     

        } catch (Exception ex) { 
            reportStatus(ex.toString());
            throw ex;
        }
    } else {
        reportStatus("Audio file required.");
    }
}
/**
 * Render a WaveForm.
 */
class SamplingGraph implements Runnable {

    private Thread thread;
    private Font font10 = new Font("serif", Font.PLAIN, 10);
    private Font font12 = new Font("serif", Font.PLAIN, 12);
    Color jfcBlue = new Color(000, 000, 255);
    Color pink = new Color(255, 175, 175);


    public SamplingGraph() {
    }


    public void createWaveForm(byte[] audioBytes) {

        lines.removeAllElements();  // clear the old vector

        AudioFormat format = audioInputStream.getFormat();
        if (audioBytes == null) {
            try {
                audioBytes = new byte[
                    (int) (audioInputStream.getFrameLength() 
                    * format.getFrameSize())];
                audioInputStream.read(audioBytes);
            } catch (Exception ex) { 
                reportStatus(ex.getMessage());
                return; 
            }
        }
        int w = 500;
        int h = 200;
        int[] audioData = null;
        if (format.getSampleSizeInBits() == 16) {
             int nlengthInSamples = audioBytes.length / 2;
             audioData = new int[nlengthInSamples];
             if (format.isBigEndian()) {
                for (int i = 0; i < nlengthInSamples; i++) {
                     /* First byte is MSB (high order) */
                     int MSB = (int) audioBytes[2*i];
                     /* Second byte is LSB (low order) */
                     int LSB = (int) audioBytes[2*i+1];
                     audioData[i] = MSB << 8 | (255 & LSB);
                 }
             } else {
                 for (int i = 0; i < nlengthInSamples; i++) {
                     /* First byte is LSB (low order) */
                     int LSB = (int) audioBytes[2*i];
                     /* Second byte is MSB (high order) */
                     int MSB = (int) audioBytes[2*i+1];
                     audioData[i] = MSB << 8 | (255 & LSB);
                 }
             }
         } else if (format.getSampleSizeInBits() == 8) {
             int nlengthInSamples = audioBytes.length;
             audioData = new int[nlengthInSamples];
             if (format.getEncoding().toString().startsWith("PCM_SIGN")) {
                 for (int i = 0; i < audioBytes.length; i++) {
                     audioData[i] = audioBytes[i];
                 }
             } else {
                 for (int i = 0; i < audioBytes.length; i++) {
                     audioData[i] = audioBytes[i] - 128;
                 }
             }
        }

        int frames_per_pixel = audioBytes.length / format.getFrameSize()/w;
        byte my_byte = 0;
        double y_last = 0;
        int numChannels = format.getChannels();
        for (double x = 0; x < w && audioData != null; x++) {
            int idx = (int) (frames_per_pixel * numChannels * x);
            if (format.getSampleSizeInBits() == 8) {
                 my_byte = (byte) audioData[idx];
            } else {
                 my_byte = (byte) (128 * audioData[idx] / 32768 );
            }
            double y_new = (double) (h * (128 - my_byte) / 256);
            lines.add(new Line2D.Double(x, y_last, x, y_new));
            y_last = y_new;
        }
        saveToFile();
    }


    public void saveToFile() {            
        int w = 500;
        int h = 200;
        int INFOPAD = 15;

        BufferedImage bufferedImage = new BufferedImage(w, h, BufferedImage.TYPE_INT_RGB);
        Graphics2D g2 = bufferedImage.createGraphics();

        createSampleOnGraphicsContext(w, h, INFOPAD, g2);            
        g2.dispose();
        // Write generated image to a file
        try {
            // Save as PNG
            File file = new File(fileName);
            System.out.println(file.getAbsolutePath());
            ImageIO.write(bufferedImage, "png", file);
            JOptionPane.showMessageDialog(null, 
                    new JLabel(new ImageIcon(fileName)));
        } catch (IOException e) {
        }
    }


    private void createSampleOnGraphicsContext(int w, int h, int INFOPAD, Graphics2D g2) {            
        g2.setBackground(imageBackgroundColor);
        g2.clearRect(0, 0, w, h);
        g2.setColor(Color.white);
        g2.fillRect(0, h-INFOPAD, w, INFOPAD);

        if (errStr != null) {
            g2.setColor(jfcBlue);
            g2.setFont(new Font("serif", Font.BOLD, 18));
            g2.drawString("ERROR", 5, 20);
            AttributedString as = new AttributedString(errStr);
            as.addAttribute(TextAttribute.FONT, font12, 0, errStr.length());
            AttributedCharacterIterator aci = as.getIterator();
            FontRenderContext frc = g2.getFontRenderContext();
            LineBreakMeasurer lbm = new LineBreakMeasurer(aci, frc);
            float x = 5, y = 25;
            lbm.setPosition(0);
            while (lbm.getPosition() < errStr.length()) {
                TextLayout tl = lbm.nextLayout(w-x-5);
                if (!tl.isLeftToRight()) {
                    x = w - tl.getAdvance();
                }
                tl.draw(g2, x, y += tl.getAscent());
                y += tl.getDescent() + tl.getLeading();
            }
        } else if (capture.thread != null) {
            g2.setColor(Color.black);
            g2.setFont(font12);
            //g2.drawString("Length: " + String.valueOf(seconds), 3, h-4);
        } else {
            g2.setColor(Color.black);
            g2.setFont(font12);
            //g2.drawString("File: " + fileName + "  Length: " + String.valueOf(duration) + "  Position: " + String.valueOf(seconds), 3, h-4);

            if (audioInputStream != null) {
                // .. render sampling graph ..
                g2.setColor(jfcBlue);
                for (int i = 1; i < lines.size(); i++) {
                    g2.draw((Line2D) lines.get(i));
                }

                // .. draw current position ..
                if (seconds != 0) {
                    double loc = seconds/duration*w;
                    g2.setColor(pink);
                    g2.setStroke(new BasicStroke(3));
                    g2.draw(new Line2D.Double(loc, 0, loc, h-INFOPAD-2));
                }
            }
        }
    }

    public void start() {
        thread = new Thread(this);
        thread.setName("SamplingGraph");
        thread.start();
        seconds = 0;
    }

    public void stop() {
        if (thread != null) {
            thread.interrupt();
        }
        thread = null;
    }

    public void run() {
        seconds = 0;
        while (thread != null) {
            if ( (capture.line != null) && (capture.line.isActive()) ) {
                long milliseconds = (long)(capture.line.getMicrosecondPosition() / 1000);
                seconds =  milliseconds / 1000.0;
            }
            try { thread.sleep(100); } catch (Exception e) { break; }                              
            while ((capture.line != null && !capture.line.isActive())) 
            {
                try { thread.sleep(10); } catch (Exception e) { break; }
            }
        }
        seconds = 0;
    }
} // End class SamplingGraph

/** 
 * Reads data from the input channel and writes to the output stream
 */
class Capture implements Runnable {

    TargetDataLine line;
    Thread thread;

    public void start() {
        errStr = null;
        thread = new Thread(this);
        thread.setName("Capture");
        thread.start();
    }

    public void stop() {
        thread = null;
    }

    private void shutDown(String message) {
        if ((errStr = message) != null && thread != null) {
            thread = null;
            samplingGraph.stop();                
            System.err.println(errStr);
        }
    }

    public void run() {

        duration = 0;
        audioInputStream = null;

        // define the required attributes for our line, 
        // and make sure a compatible line is supported.

        AudioFormat format = audioInputStream.getFormat();
        DataLine.Info info = new DataLine.Info(TargetDataLine.class, 
            format);

        if (!AudioSystem.isLineSupported(info)) {
            shutDown("Line matching " + info + " not supported.");
            return;
        }

        // get and open the target data line for capture.

        try {
            line = (TargetDataLine) AudioSystem.getLine(info);
            line.open(format, line.getBufferSize());
        } catch (LineUnavailableException ex) { 
            shutDown("Unable to open the line: " + ex);
            return;
        } catch (SecurityException ex) { 
            shutDown(ex.toString());
            //JavaSound.showInfoDialog();
            return;
        } catch (Exception ex) { 
            shutDown(ex.toString());
            return;
        }

        // play back the captured audio data
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        int frameSizeInBytes = format.getFrameSize();
        int bufferLengthInFrames = line.getBufferSize() / 8;
        int bufferLengthInBytes = bufferLengthInFrames * frameSizeInBytes;
        byte[] data = new byte[bufferLengthInBytes];
        int numBytesRead;

        line.start();

        while (thread != null) {
            if((numBytesRead = line.read(data, 0, bufferLengthInBytes)) == -1) {
                break;
            }
            out.write(data, 0, numBytesRead);
        }

        // we reached the end of the stream.  stop and close the line.
        line.stop();
        line.close();
        line = null;

        // stop and close the output stream
        try {
            out.flush();
            out.close();
        } catch (IOException ex) {
            ex.printStackTrace();
        }

        // load bytes into the audio input stream for playback

        byte audioBytes[] = out.toByteArray();
        ByteArrayInputStream bais = new ByteArrayInputStream(audioBytes);
        audioInputStream = new AudioInputStream(bais, format, audioBytes.length / frameSizeInBytes);

        long milliseconds = (long)((audioInputStream.getFrameLength() * 1000) / format.getFrameRate());
        duration = milliseconds / 1000.0;

        try {
            audioInputStream.reset();
        } catch (Exception ex) { 
            ex.printStackTrace(); 
            return;
        }

        samplingGraph.createWaveForm(audioBytes);
    }
} // End class Capture    

}

I have gone through it several times and know that the below part is where the audio values are calculated but my problem is that I have no idea how can I retrieve the time information at that point, i.e that value belongs to what time interval.

 int frames_per_pixel = audioBytes.length / format.getFrameSize()/w;
            byte my_byte = 0;
            double y_last = 0;
            int numChannels = format.getChannels();
            for (double x = 0; x < w && audioData != null; x++) {
                int idx = (int) (frames_per_pixel * numChannels * x);
                if (format.getSampleSizeInBits() == 8) {
                     my_byte = (byte) audioData[idx];
                } else {
                     my_byte = (byte) (128 * audioData[idx] / 32768 );
                }
                double y_new = (double) (h * (128 - my_byte) / 256);
                lines.add(new Line2D.Double(x, y_last, x, y_new));
                y_last = y_new;
            }

I would like to plot it using XYSeriesPLot of jfreechart but having trouble calculating required values of x(time ) and y (this is amplitude but is it y_new in this code)?

I understand it is a very easy thing but I am new to this whole audio stuff, I understand the theory behind audio files but this seems to be a simple problem with a tough solution

enter link description here

Was it helpful?

Solution

The key thing to realize is that, in the provided code, the plot is expected to be at a much lower resolution than the actual audio data. For example, consider the following waveform: enter image description here

The plotting code then represents the data as the blue boxes in the graph: enter image description here

When the boxes are 1-pixel wide, this correspond to the lines with endpoints (x,y_last) and (x,y_new). As you can see, when the waveform is sufficiently smooth the range of amplitudes from y_last to y_new is a fair approximation to the samples within the box.

Now this representation can be convenient when trying to render the waveform in a pixel-by-pixel fashion (raster display). However, for XYPlot graphs (as can be found in jfreechart) you only need to specify a sequence of (x,y) points and the XYPlot takes care of drawing segments between those point. This corresponds to the green line in the following graph: enter image description here

In theory, you could just provide every single sample as-is to the XYPlot. However, unless you have few samples, this tends to be quite heavy to plot. So, typically one would downsample the data first. If the waveform is sufficiently smooth the downsampling process reduces to a decimation (i.e. taking 1 every N samples). The decimation factor N then controls the tradeoff between rendering performance and waveform approximation accuracy. Note that if the decimation factor frames_per_pixel used in the provided code to generate a good raster display (i.e. one where the waveform feature that you'll like to see are not hidden by the blocky pixel look, and that does not show aliasing artifacts), the same factor should still be sufficient for the XYPlot (in fact you may be able to downsample a bit more).

As far as mapping the samples to a time/amplitude axes, I would not use the x and y parameters as they are defined in the plotting code provided: they are just pixel indices applicable to a raster-type display (as is the blue box representation above).

Rather I'd map the sample index (idx in the provided code) directly to the time axis by dividing by the sampling rate (which you can get from format.getFrameRate()). Similarly, I'd map the full-scale sample values to [-1,+1] range by dividing the audioData[idx] samples by either 128 for 8-bits-per-sample data, and by 32768 for 16-bits-per-sample data.

The w and h parameters' main purpose would remain to configure the plotting area size, but would no longer be directly required to compute the XYPlot input (the XYPlot itself takes care of mapping time/amplitude values to pixel coordinates). The w parameter on the other hand also served the additional purpose of determining the number of points to draw. Now you may want to control the number of points based on how much decimation the waveform can sustain without showing too much distortion, or you could keep it as-is to display the waveform at the maximum available plot resolution (with some performance cost). Note however that you may have to convert frames_per_pixel to a floating point value if you are expecting to display waveforms with fewer than w samples.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top