Comparing frequency data and zipf / rank data

Question

It should be obvious, that you can determine a rank, when you have a complete list with frequencies (order the list by frequency in descending order and assign a rank increment), but not the other way round (how would you know, how often a word occurs, given the information that it is ranked at 3rd position? You can only deduce, that it occurs with equal/lower frequency compared to word at 2nd ranked position, and equal/higher frequency compared to word at 4th position).

Applying Zipf's law, you could map the ranking back to some frequency estimation and try to roughly estimate a frequency. But I'm not sure how well this generalizes for different languages.

[edit] You really caught my attention now :) I came across this application of Zipf's Law on Wolfram MathWorld. I'll do some little experiments with an English term corpus which I created a while ago. I'll come back with results, just a little patience.

[edit2] I now took a frequency list from Word Frequencies in Written and Spoken English: based on the British National Corpus. (this one, to be exact; which only contains the top 5000 words or so, but should be enough for this quick consideration) and applied a simple 1/rank to estimate the frequencies. I did the experiment as a KNIME workflow (using the JFreeChart nodes for the chart and the Palladian nodes [disclaimer: I'm the author of the Palladian nodes] for RMSE calculation), which looks as follows:

KNIME workflow

The graph with the actual frequencies and the estimations from the rank looks as follows (rank is log scaled, sorry for not providing an adequate caption on the axis; blue line is the estimation; red line is the actual value from the dataset):

Frequency estimation

So, while there are some outliers on the higher ranks (e.g. 2,3,4), the frequency estimation should still be perfectly decent when using it in conjunction with TF-IDF or something like that. (RMSE is ~ 0.004 in this case, which is of course due to the minimal deviation in the long tail)

Here's a snippet with some actual values:

Frequency estimation list

Btw.; also have a look at this section on the Wikipedia article about Zipf's law, which shows similar results.