Question

I'm doing voice recording in javascript, and storing the recording as an array of signed floats. What would I need to determine (and ultimately, adjust) pitch on the array? I've seen various algorithms for C++, but they don't seem to be very helpful in my situation. I even downloaded and tried this one to see if I could convert parts of it to javascript:

http://voicerecorder.codeplex.com/SourceControl/latest

But all that actually did was make the recording louder, regardless of the settings I chose.

Was it helpful?

Solution

I'm not going to try to provide an exhaustive answer here, but rather describe my own findings discovered on my journey of wrestling with similar issues in audio programming.

Pitch Detection

If you're sound is monophonic (as it sounds that is is based on your comment to jeff), I've implemented pitch detection using auto-correlation techniques, mostly because it's relatively simple compared to other pitch detection algorithms.

The idea, if you're unfamiliar, is as follows:

  1. Slide a sample over itself (with a predetermined window size; in 1-sample increments)
  2. At each step, calculate the absolute difference between the original wave and the slid window (hard to explain verbally).
    • as you slide the window, keep a record of the score calculated in (2)
    • when the wave correlates with itself, the score will hit a minimum, and the time-location of this minimum specifies the signals periodicity.

In my implementation, this was the only algorithm that worked well (when fed samples of my voice; I didn't try a variety of samples however).

That was a crude explanation of how autocorrelation works, and this article provides a very nice comparison of different pitch-detection algorithms:

https://ccrma.stanford.edu/~pdelac/154/m154paper.htm

Pitch Shifting

Of course, you could get really cheap pitch shifting by just resampling, but that sounds similar to a record being played too fast, which is not acceptable in many circumstances.

As far as pitch shifting goes, I haven't gotten that far yet in my implementation, but last I left off, I was looking at phase vocoders as a possible solution. What's hard is finding a decent explanation of how these algorithms work that provides some intuition on the reason why they work the way they do instead of just providing soley abstract mathmatical equations.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top