Python Wave byte data

Question 1

If you want to understand what the 'frame' is you will have to read the standard of the wave file format. For instance: https://web.archive.org/web/20140221054954/http://home.roadrunner.com/~jgglatt/tech/wave.htm

From that document:

The sample points that are meant to be "played" ie, sent to a Digital to Analog Converter(DAC) simultaneously are collectively called a sample frame. In the example of our stereo waveform, every two sample points makes up another sample frame. This is illustrated below for that stereo example.

sample       sample              sample
frame 0      frame 1             frame N
 _____ _____ _____ _____         _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____|       |_____|_____|
 _____
|     | = one sample point
|_____|

To convert to mono you could do something like this,

import wave

def stereo_to_mono(hex1, hex2):
    """average two hex string samples"""
    return hex((ord(hex1) + ord(hex2))/2)

wr = wave.open('piano2.wav','r')

nchannels, sampwidth, framerate, nframes, comptype, compname =  wr.getparams()

ww = wave.open('piano_mono.wav','wb')
ww.setparams((1,sampwidth,framerate,nframes,comptype,compname))

frames = wr.readframes(wr.getnframes()-1)

new_frames = ''

for (s1, s2) in zip(frames[0::2],frames[1::2]):
    new_frames += stereo_to_mono(s1,s2)[2:].zfill(2).decode('hex')

ww.writeframes(new_frames)

There is no clear-cut way to go from stereo to mono. You could just drop one channel. Above, I am averaging the channels. It all depends on your application.

Question 2

For wav file IO I prefer to use scipy. It is perhaps overkill for reading a wav file, but generally after reading the wav it is easier to do downstream processing.

import scipy.io.wavfile
fs1, y1 = scipy.io.wavfile.read(filename)

From here the data y1, will be N samples long, and will have Z columns where each column corresponds to a channel. To convert to a mono wav file you don't say how you'd like to do that conversion. You can take the average, or whatever else you'd like. For average use

monoChannel = y1.mean(axis=1)

Question 3

As a direct answer to your question: two bytes make one 16-bit integer value in the "usual" way, given by the explicit formula: value = ord(data[0]) + 256 * ord(data[1]). But using the struct module is a better way to decode (and later reencode) such multibyte integers:

import struct
print(struct.unpack("HH", b"\x00\x00\x00\x00"))
# -> gives a 2-tuple of integers, here (0, 0)

or, if we want a signed 16-bit integer (which I think is the case in .wav files), use "hh" instead of "HH". (I leave to you the task of figuring out how exactly two bytes can encode an integer value from -32768 to 32767 :-)

Question 4

Another way to convert 2 bytes into an int16, use numpy.fromstring(). Here's an example: audio_sample is from a wav file.

>>> audio_sample[0:8]
b'\x8b\xff\xe1\xff\x92\xffn\xff'

>>> x = np.fromstring(audio_sample, np.int16) 

>>> x[0:4]
array([-117,  -31, -110, -146], dtype=int16)

You can use np.tobytes to convert back to bytes