Question

I can use fft to get the, frequencies, phases and magnitude of a loaded 1 second audio file of person saying "ahhhh" and recreate it. What I'm trying to do now is to find out where each of those frequencies begin and where they end in the 1 second audio file and place the data into an array

Example:100hz starts at .23seconds to .34seconds, 104.34hz starts at .35seconds and ends at .37seconds.

Can fft's do this or do I need to shift my whole program to use wavelets? Also are there any wavelet examples in octave that show how to do what I'm trying to accomplish?

I'm using Ubuntu Linux 12.04 and Octave 3.2.4 from the repo's

Thanks Rick

Was it helpful?

Solution

FFT as an algorithm to estimate a Discrete Fourier Transform (DFT), provides the frequency content of your audio signal (magnitude and phase as you mention). This will give you a set of magnitude/phase values per discrete frequency bin, which you can map to a continuous frequency value (based on the bin index or discrete frequency, the number of FFT points and the sampling frequency of your signal).

DFT though (via FFT) is a global transform, i.e. you will loose the notion of time since you are moving to the frequency domain. What you need is the Short-Time Fourier Transform (STFT),i.e. FFT on short time-frame (windows) of the signal. This will give you as output a time-frequency representation in which you can specify frequency content per analysis window, and thus per short-time instances.

Approach sketch: Define temporal window length and window shifts (based on the desired time resolution or linearly-sampled time instances), run STFT and then a method for peak-picking or local maxima estimation on the Fourier magnitude in each window. This will give you locations of dominant frequencies which you can track across-time (onsets etc.)

In MATLAB check spectrogram for an implementation of STFT to get you started.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top