How to convert a speech spectrum to time domain

https://stackoverflow.com/questions/22890919

28-06-2023
|

سؤال

I am doing speech analysis. I recorded the sound for 5 seconds. Applied Hamming window, DC offsetting and normalising and using fft took the spectrum. I want to hear how much the sound has changed. So is there a way to convert the fft back to time domain?

clc,clear;
% Record your voice for 5 seconds.
%recObj = audiorecorder;
recObj = audiorecorder(96000, 16, 1);
disp('Start speaking.')
recordblocking(recObj,5);

disp('End of Recording.');
% Play back the recording.
play(recObj);
get(recObj);
myspeech = getaudiodata(recObj);
wavwrite(double(myspeech),96000,'C://Users//naveen//Desktop//unprocessed')

% Store data in double-precision array.
myRecording = getaudiodata(recObj);

% Store data in double-precision array.
myRecording = getaudiodata(recObj);

% Plot the samples.
figure,plot(myRecording),title('Original Sound');
%Offset Elimination
 a = myRecording;
 a=double(a);
 D = a-mean(a);
 figure,plot(myRecording),title('Sound after Offset Elimination');

 %normalizing
 w = D/max(abs(D));
 figure,plot(w),title('Normalized  Sound');

 %      hamming window 
 a1=double(w);
 %a1=a1';
 N=length(w);
 hmw = hamming(N);
 temp = a1.*hmw;
 a1 = temp;

 %Fast Fourier Transform 
 a2=double(a1);
 N=length(a1);
 n=ceil(log2(N));
 nz=2^n;
 fs = 96000;
 x_z=0*[1:nz];
 x_z(1:N)=a2;
 X=fft(x_z);
 x1=abs(X);
 wq=double(0:nz-1)*(fs/nz);
 figure,stem(wq,x1),title('Spectrum');    
 xlabel('Frequency (Hz)');
 ylabel('Magnitude of FFT Coefficients');

 nz1=round(nz/2)
 x2=x1(1:nz1);
 w1=wq(1:nz1);
 figure,plot(w1,x2);
 title('Half Length Spectrum of Sound');
 nz2=nz1*10;

المحلول

Like you do fft you can also apply ifft which is the inverse of the fourier transform (http://www.mathworks.es/es/help/matlab/ref/ifft.html)

نصائح أخرى

Using the abs() function on complex data is a lossy operation which throws away any phase information. The phase information encodes the waveform shapes and well as the timing of any transients in the FFT window. Since that information has been discarded, a magnitude spectrum or spectrogram alone can't be turned back into audio that sounds like the original speech.

But if you keep the full complex results of the FFT, then a complex IFFT might be used in some sort of resynthesis process.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow