Question

I was asked to make a software that will encrypt and decrypt a "normal English" text based on letter frequencies.

The question is where do I find some text samples where the official frequencies will match?

So far, I have tried "War and Peace" by Lev Tolstoy, it didn't work well..

LE: I don't need just a list of words, I need a text sample to make some processing.
LE2: The goal is to guess 20 from 26 in a 2000 characters long text.

Was it helpful?

Solution

You're searching for English text corpora, e.g. http://faculty.washington.edu/ebender/corpora/corpora.html#modern. Out of what's listed there, I know that Project Gutenberg is free; many of the others might not be.

I'm not sure what you mean by the official frequencies -- the point of the frequencies is to match what you find in the wild, and if they don't, that's the frequency table's problem.

OTHER TIPS

Check out infochimps; they have a bunch of freely available datasets that may be useful.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top