Question

I've been struggling for a few weeks on a phase vocoder. The ultimate goal is achieving time stretching of a signal. I've been making a lot of progress, but I still have two issues to solve.

Issue1: Do I need a synthesis window?.
I take overlapping frames from the input signal (a sine wave) with any hop size (e.g. N/2, N = samples per frame). I apply a Hanning window to the frame and feed the result to FFT. To achieve time-stretching I perform iFFT and overlap-add the output frames using a different hop size than the one used during analysis.
The problem is that with an output hop factor = 0.5 (hop size = N/2) the output is smooth, but for greater hop-sizes I can hear 'vibrations'. The image shows the output of 8 frames with a hop factor = 1 (zero overlap). It is evident why the sound is vibrating. For small hop sizes the frames overlap much more and the sound is smoother. I've read a lot about phase vocoding, but I don't seem to get how to get a smooth output for large hop sizes. What am I missing?

enter image description here

Issue2: Phase-correction.
Currently the output sounds worse with phase correction but I'll leave that for another post.

Thanks in advance for taking the time.

Was it helpful?

Solution

I'm an amateur at this, but wouldn't you get a better result if you started with a much bigger overlap, e.g. a "hop size" of N/10 or something like that? Then you'd have more freedom to adjust it on output while still keeping a substantial overlap.

Also, it might pay to adjust the steepness of the window depending on how much you're expanding/compressing time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top