Audio signal differences between files and recording from a microphone

Question

This will depend a great deal on the environmental factors of the recording, including the room, the frequency response of the speaker/microphone combination and their type/position within the recording room. The software may be able to help you clean this up, but getting a clean recording will be the single most important factor affecting your software's profiling abilities.

Assuming your recording levels are set correctly, and your microphone and speakers have a relatively flat frequency response you will still be transforming the frequency profile of the sound according to the environment.

This effect may not be immediately obvious on playback, but there will a number of elements of the sound that are affected detrimentally. This has been used by composers to great effect.

See Alvin Lucier's I am sitting in a room at http://www.ubu.com/sound/lucier.html for a beautiful example of this type of composition.

Many of the transient smearing effects you hear in that recording will affect speech profiling dramatically, so the set-up of your recording will need to be considered in great detail. It's probably best to speak to a sound-engineer for tips on the recording setup, as it seems like this is the part you seem to be struggling with. e.g. you don't mention the acoustic properties of the room you are using, or the audio set-up.

You could also do an impulse response of the room/mic/speaker set-up you intend to use, and then deconvolve the recorded speech with the impulse, which should theoretically reduce the recording to a perfect representation of the original signal. This is tricky but can provide somejaw-dropping results.