(numpy) Wrong amplitude(?) of FFT'd array?

Question 1

Henry is right on the non-normalization part, but there is a little more to it, because you are using rfft, not fft. The following is consistent with his answer:

>>> x = np.linspace(0, 2 * np.pi, 128)
>>> y = 1 - np.sin(x)
>>> fft = np.fft.fft(y)
>>> np.mean((fft * fft.conj()).real)
191.49999999999991
>>> np.mean(y**2)
1.4960937500000004
>>> fft = fft / np.sqrt(len(fft))
>>> np.mean((fft * fft.conj()).real)
1.4960937499999991

But if you now try the same with rfft, things don't quite work out:

>>> rfft = np.fft.rfft(y)
>>> np.mean((rfft * rfft.conj()).real)
314.58462009358772
>>> rfft /= np.sqrt(len(rfft))
>>> np.mean((rfft * rfft.conj()).real)
4.8397633860551954
65
>>> np.mean((rfft * rfft.conj()).real) / len(rfft)
4.8397633860551954

The following does work properly, though:

>>> (rfft[0] * rfft[0].conj() +
...  2 * np.sum(rfft[1:] * rfft[1:].conj())).real / len(y)
1.4960937873636722

When you use rfft what you are getting is not properly the DFT of your data, but only the positive half of it, since the negative would be symmetric to it. To compute the mean, you need to consider every value other than the DC component twice, which is what the last line of code does.

Question 2

In most FFT libraries, the various DFT flavours are not orthogonal. The numpy.fft library applies the necessary normalizations only during the inverse transform.

Consider the Wikipedia description of the DFT; the inverse DFT has the 1/N term that the DFT does not have (in which N is the length of the transform). To make an orthogonal version of the DFT, you need to scale the result of the un-normalised DFT by 1/sqrt(N). In this case, the transform is orthogonal (that is, if we define the orthogonal DFT as F, then the inverse DFT is the conjugate, or hermitian, transpose of F).

In your case, you can get the correct answer by simply scaling aft by 1.0/sqrt(len(a)) (note that N is found from the length of the transform; the real FFT just throws about half the values away, so it's the length of a that is important).

I suspect that the reason for leaving the normalization until the end is that in most situations, it doesn't matter and you therefore save the computational cost of doing the normalization twice. Indeed, the very quick FFTW library doesn't do any normalization in either direction, and leaves it entirely up to the user to deal with.

Edit: Just to be clear, the explanation above is not quite correct. The correct answer will not be arrived at with that simple scaling, as in your case the DC component will be added in twice, although 1.0/sqrt(len(a)) is still the correct scaling to produce the unitary transform.