How to convert a speech spectrum to time domain

https://stackoverflow.com/questions/22890919

28-06-2023
|

Question

I am doing speech analysis. I recorded the sound for 5 seconds. Applied Hamming window, DC offsetting and normalising and using fft took the spectrum. I want to hear how much the sound has changed. So is there a way to convert the fft back to time domain?

clc,clear;
% Record your voice for 5 seconds.
%recObj = audiorecorder;
recObj = audiorecorder(96000, 16, 1);
disp('Start speaking.')
recordblocking(recObj,5);

disp('End of Recording.');
% Play back the recording.
play(recObj);
get(recObj);
myspeech = getaudiodata(recObj);
wavwrite(double(myspeech),96000,'C://Users//naveen//Desktop//unprocessed')

% Store data in double-precision array.
myRecording = getaudiodata(recObj);

% Store data in double-precision array.
myRecording = getaudiodata(recObj);

% Plot the samples.
figure,plot(myRecording),title('Original Sound');
%Offset Elimination
 a = myRecording;
 a=double(a);
 D = a-mean(a);
 figure,plot(myRecording),title('Sound after Offset Elimination');

 %normalizing
 w = D/max(abs(D));
 figure,plot(w),title('Normalized  Sound');

 %      hamming window 
 a1=double(w);
 %a1=a1';
 N=length(w);
 hmw = hamming(N);
 temp = a1.*hmw;
 a1 = temp;

 %Fast Fourier Transform 
 a2=double(a1);
 N=length(a1);
 n=ceil(log2(N));
 nz=2^n;
 fs = 96000;
 x_z=0*[1:nz];
 x_z(1:N)=a2;
 X=fft(x_z);
 x1=abs(X);
 wq=double(0:nz-1)*(fs/nz);
 figure,stem(wq,x1),title('Spectrum');    
 xlabel('Frequency (Hz)');
 ylabel('Magnitude of FFT Coefficients');

 nz1=round(nz/2)
 x2=x1(1:nz1);
 w1=wq(1:nz1);
 figure,plot(w1,x2);
 title('Half Length Spectrum of Sound');
 nz2=nz1*10;

Solution

Like you do fft you can also apply ifft which is the inverse of the fourier transform (http://www.mathworks.es/es/help/matlab/ref/ifft.html)

OTHER TIPS

Using the abs() function on complex data is a lossy operation which throws away any phase information. The phase information encodes the waveform shapes and well as the timing of any transients in the FFT window. Since that information has been discarded, a magnitude spectrum or spectrogram alone can't be turned back into audio that sounds like the original speech.

But if you keep the full complex results of the FFT, then a complex IFFT might be used in some sort of resynthesis process.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow