Question

My understanding is that for most encryption algorithms there is always an output, regardless of the key. A wrong key will of course produce a wrong output. So when using brute force to decrypt encrypted data, how do hackers know when the key was correct? Is there a way other than analyzing the output data?

If this is the only way, I have this thoughts. When encrypting texts, wouldn't it safer to encrypt on word level using a directory rather than on bit level as done today? Then the output would always consist of words. Hackers would need to use complicated and slow algorithms to check grammar in the output words to determine whether this could be the real written text.

Was it helpful?

Solution

To answer the first part, I simply state my old answer to the super user question "How does Truecrypt know it has the correct password?"

It knows the correct password because within that encrypted container there is a known header.

When Trucrypt decrypts a blob of data and the header matches what it was expecting it reports back that the decryption was successful. If you use a incorrect password it will still "decrypt" the text, but it will decrypt the header in to gibberish and fail the decryption check.

Here is a link to the specification, you can see there are many things that must be true for it to be a valid header (bytes 64-67 after decryption should always be the ASCII value TRUE, bytes 132-251 must all be 0's, ect.). If you you decrypt a blob of data and it does not match that header format, you know the decryption failed.

So they already do what you where suggesting about "checking the grammar", they attempt to decrypt the message and if the message "has proper grammar" (the data follows the spec of the encrypted file format) the message was successfully decrypted.

For your 2nd part of "using a dictionary" there are a few important issues.

First, this would only work on plain unformatted text, no binary data or text metadata allowed. However, more importantly, second how do you "create" this dictionary? If you create the dictionary on the fly using the words in the document tell me what would be the dictionary for the following message:

We attack tomorrow!

You could pad the dictionary with extra words but how do you choose the padding? If you used an existing fixed dictionary, what if a word is not in the dictionary, what do you then? What about misspellings?

I have not even begun to touch on how this method is very likely to leek information. Like you said, English has a set of rules for grammar and some words more often come near the end of sentences and some words come more often near the start of sentences, looking at the numbers used as the indexes you could potentially do a statistical analysis on it and rule out a portion of the dictionary as "unlikely" to be used words.

I am sure there are many many other problems with this, but I am only a beginner in crypto and I can not think of any others off of the top of my head.

There is an adage in cryptography "It is easy to for you to create a cypher that you yourself can not break, it is quite hard for you to make a cypher that other people can not break"

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top