Question

I have several Gb of sample data captured 'in-the-field' at 48ksps using an NI Data Acquisition module. I would like to create a WAV file from this data.

I have done this previously using MATLAB to load the data, normalise it to the 16bit PCM range, and then write it out as a WAV file. However MATLAB baulks at the file size as it does everything 'in-memory'.

I would ideally do this in C++ or C, (C# is an option), or if there is an existing utility I'd use that. Is there a simple way (i.e. an existing library) to take a raw PCM buffer, specify the sample rate, bit depth, and package it into a WAV file?

To handle the large data set, it would need to be able to append data in chunks as it would not necessarily be possible to read the whole set into memory.

I understand that I could do this from scratch using the format specification, but I do not want to re-invent the wheel, or spend time fixing bugs on this if I can help it.

Was it helpful?

Solution

I think you can use libsox for this.

OTHER TIPS

Interesting, I have found a bug on stackoverflow parse of code, it dont support the \ character at the end of the line like you see below, sad

//stolen from OGG Vorbis pcm to wav conversion rountines, sorry
#define VERSIONSTRING "OggDec 1.0\n"

static int quiet = 0;
static int bits = 16;
static int endian = 0;
static int raw = 0;
static int sign = 1;
unsigned char headbuf[44];  /* The whole buffer */







#define WRITE_U32(buf, x) *(buf)     = (unsigned char)((x)&0xff);\
                          *((buf)+1) = (unsigned char)(((x)>>8)&0xff);\
                          *((buf)+2) = (unsigned char)(((x)>>16)&0xff);\
                          *((buf)+3) = (unsigned char)(((x)>>24)&0xff);

#define WRITE_U16(buf, x) *(buf)     = (unsigned char)((x)&0xff);\
                          *((buf)+1) = (unsigned char)(((x)>>8)&0xff);

/*
 * Some of this based on ao/src/ao_wav.c
 */
static int
write_prelim_header (FILE * out, int channels, int samplerate)
{

  int knownlength = 0;

  unsigned int size = 0x7fffffff;
  // int channels = 2;
  // int samplerate = 44100;//change this to 48000
  int bytespersec = channels * samplerate * bits / 8;
  int align = channels * bits / 8;
  int samplesize = bits;

  if (knownlength)
    size = (unsigned int) knownlength;

  memcpy (headbuf, "RIFF", 4);
  WRITE_U32 (headbuf + 4, size - 8);
  memcpy (headbuf + 8, "WAVE", 4);
  memcpy (headbuf + 12, "fmt ", 4);
  WRITE_U32 (headbuf + 16, 16);
  WRITE_U16 (headbuf + 20, 1);  /* format */
  WRITE_U16 (headbuf + 22, channels);
  WRITE_U32 (headbuf + 24, samplerate);
  WRITE_U32 (headbuf + 28, bytespersec);
  WRITE_U16 (headbuf + 32, align);
  WRITE_U16 (headbuf + 34, samplesize);
  memcpy (headbuf + 36, "data", 4);
  WRITE_U32 (headbuf + 40, size - 44);

  if (fwrite (headbuf, 1, 44, out) != 44)
    {
      printf ("ERROR: Failed to write wav header: %s\n", strerror (errno));
      return 1;
    }

  return 0;
}

static int
rewrite_header (FILE * out, unsigned int written)
{
  unsigned int length = written;

  length += 44;

  WRITE_U32 (headbuf + 4, length - 8);
  WRITE_U32 (headbuf + 40, length - 44);
  if (fseek (out, 0, SEEK_SET) != 0)
    {
      printf ("ERROR: Failed to seek on seekable file: %s\n",
          strerror (errno));
      return 1;
    }

  if (fwrite (headbuf, 1, 44, out) != 44)
    {
      printf ("ERROR: Failed to write wav header: %s\n", strerror (errno));
      return 1;
    }
  return 0;
}

I came across a function called WAVAPPEND on Mathworks' File Exchange site a while ago. I never got around to using it, so I'm not sure if it works or is appropriate for what you're trying to do, but perhaps it'll be useful to you.

Okay... I'm 5 years late here... but I just did this for myself and wanted to put the solution out there!

I had the same issue with running out of memory while writing large wav files in matlab. I got around this by editing the matlab wavwrite function so it pulls data from your harddrive using memmap instead of variables stored on the RAM, then saving it as a new function. This will save you a lot of trouble, as you don't have to worry about dealing with headers when writing the wav file from scratch, and you wont need any external applications.

1) type edit wavwriteto see the code for the function, then save a copy of it as a new function.

2) I modified the y variable in the wavwrite function from an array containing the wav data to a cell array with strings pointing to the locations for the data of each channel saved on my harddrive. Use fwrite to store your wav data on the harddrive first of course. At the beginning of the function I transformed the file locations stored in y into memmap variables and defined the number of channels and samples like so:

replace these lines:

% If input is a vector, force it to be a column:
if ndims(y) > 2,
  error(message('MATLAB:audiovideo:wavwrite:invalidInputFormat'));
end
if size(y,1)==1,
   y = y(:);
end
[samples, channels] = size(y);

with this:

% get num of channels
channels = length(y);

%Convert y from strings pointing to wav data to mammap variables allowing access to the data
for i  = 1:length(y)
   y{i} = memmapfile(y{i},'Writable',false,'Format','int16');
end
samples = length(y{1}.Data);

3) Now you can edit the private function write_wavedat(fid,fmt). This is the function that writes the wav data. Turn it into a nested function so that it can read your y memmap variable as a global variable, instead of passing the value to the function and eating up your RAM, then you can make some changes like this:

replace the lines which write the wav data:

if (fwrite(fid, reshape(data',total_samples,1), dtype) ~= total_samples), error(message('MATLAB:audiovideo:wavewrite:failedToWriteSamples')); end

with this:

%Divide data into smaller packets for writing
       packetSize = 30*(5e5); %n*5e5 = n Mb of space required
       packets = ceil(samples/packetSize);

       % Write data to file!
       for i=1:length(y)
           for j=1:packets
               if j == packets
                    fwrite(fid, y{i}.Data(((j-1)*packetSize)+1:end), dtype);
               else
                    fwrite(fid, y{i}.Data(((j-1)*packetSize)+1:j*packetSize), dtype);
               end
               disp(['...' num2str(floor(100*((i-1)*packets + j)/(packets*channels))) '% done writing file...']);
           end
       end

This will incrementally copy the data from each memmap variable into the wavfile

4) That should be it! You can leave the rest of the code as is, as it'll write the headers for you. Heres an example of how you'd write a large 2 channel wav file with this function:

wavwriteModified({'c:\wavFileinputCh1' 'c:\wavFileinputCh2'},44100,16,'c:\output2ChanWavFile');

I can verify this approach works, as I just wrote a 800mB 4 channel wav file with my edited wavwrite function, when matlab usually throws an out of memmory error for writing wav files larger then 200mb for me.

C# would be a good choice for this. FileStreams are easy to work with, and could be used for reading and writing the data in chunks. Also, reading WAV file headers is a relatively complicated task (you have to search for RIFF chunks and so on), but writing them is cake (you just fill out a header structure and write it at the beginning of the file).

There are a number of libraries that do conversions like this, but I'm not sure they can handle the huge data sizes you're talking about. Even if they do, you would probably still have to do some programming work to feed smaller chunks of raw data to these libraries.

For writing your own method, normalization isn't difficult, and even resampling from 48ksps to 44.1ksps is relatively simple (assuming you don't mind linear interpolation). You would also presumably have greater control over the output, so it would be easier to create a set of smaller WAV files, instead of one gigantic one.

The current Windows SDK audio capture samples capture data from the microphone and save the captured data to a .WAV file. The code is far from optimal but it should work.

Note that RIFF files (.WAV files are RIFF files) are limited to 4G in size.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top