Differences in WAV format (JS / NodeJS)

https://stackoverflow.com/questions/21764529

11-10-2022
|

Question

I'm trying to use WebRTC to record audio and then store it on the server side. My server is made using NodeJS and express, and I'm using POST to transmit the data from the client to the server.

On the client I'm translating the data from the wav BLOB to base64, transfer that, and on the server side, read it, translate it to binary, and then write it in a file. Should be fine, right?

There's just one problem : I'm getting some really bad inconsistencies between what you can download from the client, and what gets sent to the server. Sometimes it's added bytes, other times it's just deleted chunks of data. If it were just bytes added, that would mean a charset problem (translating from one to another, and then another, etc), but at some points I had 280 bytes added for example.

I've added a picture here of a hex diff : http://i.stack.imgur.com/psqf4.png (sorry, I don't have enough reputation so far to post an image directly)

Also, running file with these gives me the following : (uuid.wav is the server one, while output (1).wav is the client one)

9F2B75D3-4C34-4C8F-935E-FC7637D7A054.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 4 bit, stereo 11321924 Hz

output (1).wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 44100 Hz

... so clearly something is going wrong here. Also, trying to fix the headers, or convert the WAV gives me an error that goes along the lines of : could not find data chunk / data chunk has size 0.

Any ideas what might be causing this?

Solution

This looks suspiciously like some layer of code is attempting to convert the binary data to Unicode. 0x44 0xAC (which is 0xAC44 in little endian, which is 44100, which indicates the 44.1 kHz sample rate) is turning into 0x44 0xC2 0xAC. This gets byte-swapped to 0x00ACC244, which is 11321924 Hz, which reconciles with what you saw in the corrupted file.

Those 0xC2 additions definitely look like Unicode (UTF-8) artifacts. I don't know exactly which data types and functions you are using, but you will need to audit the steps to make sure none of them attempt to do implicit Unicode conversions.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow