If you want to recreate the sound of the iPhone speaker/mic, ideally you need to find the impulse response of the system.
What you are doing wrong: finding the FFT of a sine sweep is meaningless since the input frequency is something that changes (linearly or exponentially or other) to begin with, before the system imposes its own frequency response on top of that. As Paul R suggested above, finding FFTs of white noise makes more sense since averaging over many statistically-flat input frequencies will give you the actual frequency response of the system.
However, if your goal is to recreate the sound of the system, you also need to take care of phase, which is not being done in either of the above methods. The 'ideal' way to do it would be to capture the response of the iPhone speaker/mic system to an 'impulse' in a perfectly quiet and dry (no reflections) environment. There are 3 ways to do so: 1. Use a balloon pop sound, or a synthetically created impulse sound to do so. 2. Use Golay Codes, which is a simpler way of averaging many impulse response measurements 3. Use sine sweeps but then use correlation to find the impulse response.
Reference: https://ccrma.stanford.edu/realsimple/imp_meas/imp_meas.pdf
Once you obtain the impulse response measurement, either convolve this with the signal you are trying to 'color', or take the FFT of both signals, multiply in the frequency domain, and then take inverse FFT to get the colored signal.
Explanation: I'll try and explain it to the best of my knowledge: - When you take the FR of an impulse response, you take the magnitude of its FFT, throwing away the phase data. Therefore, there are many filters(systems) with the same magnitude FR that will give you radically different outputs. Case in point will be Allpass Filters - they all have a flat FR but if you put an impulse through them, you can get back a sine sweep, depending on the filter parameters. Clearly, this should point to the fact that though you can always go FROM an IR to a FR, going back in the opposite direction means you are making an arbitrary choice. Hence, you cannot throw away the phase, even for rough estimates. The fact that we cannot hear phase means that we can look at the FR for information about the system but does not allow us to disregard phase in modeling the system. I hope that makes sense? To use a sine sweep, do the following - if s(t) = sin(A(t)) and A(t) = integral[0 to t] (w(t)dt), correlate the signal e(t) = corr(v(t),sin(A(t)) where v(t) = 2 * abs(dw/dt) will produce an impulse. Therefore, if you replace the sine sweep in that correlation with the measured signal, you should obtain its impulse response. Hope that helps! Sorry for it being so math-y.