Calculating size of a theoretical text file

Question 1

No, it is not possible to estimate the size of a compressed version of a file based purely on its character count. Different strings can be compressed at different levels of efficiency; a string made purely of one character will be much more easily compressed than a string of purely randomly generated characters.

In information theory, there is a concept of Kolmogorov complexity, which is (more or less) the smallest amount of information necessary to reconstruct a string. Not all strings an be compressed into smaller strings, and it is impossible to build a general algorithm to find the Kolmogorov complexity of an arbitrary string. Moreover, it's impossible to prove that you have found the optimal encoding for a string once the string ets sufficiently long.

Hope this helps!

Question 2

If you want to say it fits on a 1.44 MB floppy, then just prove it with a better compressor. Try 7-Zip or xz (depending on your platform). You are close enough that I'm sure that will do the trick. (Did you use gzip -9?)

By the way, I'm not sure about the utility of this, since many people will have no clue what in the world you're talking about when you describe this "floppy disk" thing to them.

As already noted, is it not possible to calculate the theoretical best compression. Just use the best compressors to get an estimate.

Update:

Downloaded it. xz compressed it to 1177180 bytes. So yes, it fits.