Decrypting text using frequency analysis in c#.

https://stackoverflow.com/questions/8922444

17-04-2021
|

Question

I've been tasked with decrypting a text file using frequency analysis. This isn't a do it for me question but i have absolutley no idea what to do next. What i have so far reads in the text from file and counts the frequency of each letter. If someone could point me in the right direction as to swapping letters depending on their frequency it would be much appreciated.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace freqanaly
{
    class Program
    {
        static void Main()
        {
            string text = File.ReadAllText("c:\\task_2.txt");
            char[,] message = new char[2,26];
            Console.Write(text); int count = 0;
            for (int x = 'A'; x <= 'Z'; x++)
            {
                message[0, count] = (char)x;
                Console.WriteLine(message[0, count]);
                count++;
            }

            foreach (char c in text)
            {  count = 0;
                for (int x = 'A'; x <= 'Z'; x++)
                {
                    if (c == x)
                    {
                        message[1, count]++;
                    }
                    count++;
                }
            }

            Console.ReadKey();
            for (int x = 0; x <= 25; x++)
            {
                Console.Write(message[0, x]); Console.Write(" = "); Console.WriteLine((int)message[1, x]);
            }
            Console.ReadKey();
        }
    }
}

Solution

This IS encrypted data, just using a simple subsitution cipher (I assume). See the definition of encoding/encrypting. http://www.perlmonks.org/index.pl?node_id=66249

Regardless, as Sergey suggested, get a letter frequency table and match frequencies. You will have to take into account some deviation, since there is no guarantee there are exacltly 8.167% of 'A's in the document (perhaps in this document the percent of 'A's are 8.78 or 7.65%). Also, be sure to evaluate on every occurance of A, and not differentiate 'a' from 'A'. This can be handled with a simple ToUpper or ToLower transform on the character; just be consistant.

Also, when you start getting into less common, but still popular letters, you will need to handle that. C, F, G, W, and M are all around the 2% +/- mark, so you will need to play with the decrypted text till the letters fit in the word, and in other words within the document where this character substitution will also happen. This concept is similar to fitting numbers in a Suduko matrix. Luckily, once you find where a letter should go, it cascades through out the document and you can start to see the decrypted plain text emerge. As an example, '(F)it' and '(W)it' are both valid words, but if you see '(F)hen' in the document when you substitute a 'F', you can make a good guess that you should substitute this character with a 'W' instead. (T)here and (W)here is another example, and a word ()hen won't provide any guidance by itself, since both (W)hen and (T)hen are valid words. It is here you have to incorporate contextual clues as to which word makes sense. "Then is a good time to start our attack?" doesn't make as much sense as "When is a good time to start our attack?".

All of this is assusming you are using a monoalphebetic substitution. A polyalphebetic substitution is more difficult, and you may need to look into cracking the Vigenère cipher examples to try to figure out a way around this problem.

I suggest reading "The Code Book" by S. Singh. It is a very interesting read and easy to digest the historical ciphers used and how they were cracked.

http://www.google.com/products/catalog?q=the+code+book&rls=com.microsoft:en-us:IE-SearchBox&oe=&um=1&ie=UTF-8&tbm=shop&cid=5361323398438876518&sa=X&ei=hpR0T-HyObSK2QWvgvH-Dg&ved=0CFoQ8wIwBQ#

OTHER TIPS

Next you should grab some of publically available English frequency lists (from Wikipedia, for example) and compare the actual frequencies table you got with it - in order to find the replacements for letters.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow