fft
"splits" a signal into frequency "bins", where the maximum frequency you can observe is the Nyquist frequency or half your sampling frequency. This means that for:
Y = fft(X,N); % (1D case)
the frequencies corresponding to fft-values in Y(1:N/2+1)
will be:
f = [fs/2*linspace(0,1,N/2+1)]; % where fs is your sampling frequency
The other half of Y is just mirrored and comes from the intrinsics of the Fourier transform. If you don't want to understand it fully, I would say that there is no need to bother with it other than what you can find on wikipedia. But for the sake of curiosity you can check out a nice intuitive illustration of the origin of positive and negative frequencies: https://dsp.stackexchange.com/questions/431/what-is-the-physical-significance-of-negative-frequencies/449#449.
Some key differences for your 2D-image case:
Using fftshift you've moved the 0-frequencies to the center of the matrix, rather than having them at the edges as in the above 1D example. So you would actually get
f = fs/2 * linspace(-1,1,N)
(once again, don't mind the negative frequencies)The next issue is to obtain the sampling frequency. Spatial frequencies are typically measured in [mm^-1], so in order to obtain it you actually need to know the physical distance between pixel centres (pixel pitch). But you could of course consider the spatial frequencies in [pixels^-1] in which case you are ready to go.