FFT Algorithm - How to convert samples to complex structs?

https://stackoverflow.com/questions/11762941

24-06-2021
|

Pregunta

The short version of my question:

How do I go from an array of audio samples (as Int16) to Complex structs used by common FFT libraries such as AForge?

The long version of my question:

I am new to audio processing and am looking to analyze audio in music (locating beats, tempo etc). After a couple of days reading up on this, the Fast Fourier Transform (FFT) algorithm seems to be a step in the right direction.

To get going and not having to implement the algorithms from scratch, I've downloaded a couple of open source libraries, Exocortex and AForge.net.

I am decoding the audio to a memory buffer. On each cycle for my 16 bit 48kHz audio stereo track, I get 48000 bytes or 24000 samples. These are currently copied into an array of short's (Int16). I now need to convert my array to Complex structs (in the case of AForge). This in turn is initialized with a "real" and an "imaginary" double value. But what exactly are these and how do I go from my array to these two double values? Also, do I need to split up the left/right channel before passing it on?

Unfortunately, I am terrible at reading math formulas as long as it is not presented as code. On nearly all sites I've visited so far, greeek symbols and complex math formulas quickly arises to help explain the algorithm. As a result, I get lost in translation right away. Believe me, I tried hard finding a "FFT algorithms for dummies". ;)

Note to moderators : This is not a duplicate of Convert Audio samples from bytes to complex numbers? even though the question is similar.

Solución

You need to either split the channels and process them separately, or average them to a single mono channel - which is best depends on what you ultimately are trying to accomplish.

As for converting data types - every real number is also a complex number that has an "imaginary" part of 0, so the conversion is essentially to create an array of complex numbers with the sample (preferably normalized - in this case divide by 32768 so you have values in the range [-1,+1]) in the real part and zero in the imaginary part.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow