Is it possible to crack SHA256, when you know the answer is a coordinate?

Question 1

Producing all possible coordinates between 00.000000 and 99.999999 is easy enough:

from itertools import product
import hashlib

digits = '0123456789'

for combo in product(digits, repeat=16):
    coords = '{}.{} {}.{}'.format(
        ''.join(combo[:2]), ''.join(combo[2:8]),
        ''.join(combo[8:10]), ''.join(combo[10:]))
    hash = hashlib.sha256(coords).hexdigest()
    if hash == '3f1c756daec9ebced7ff403acb10430659c13b328c676c4510773dc315784e4e':
        print coords
        break

This'll brute-force all 10**16 (a big number) combinations. Sit back and relax, this'll take a while.

And by 'a while', we really mean not in your lifetime, or anyone else's. Just iterating over all possible combinations produced by product() takes a huge amount of time, as each added digit to try increases the time required by a factor of 10:

>>> from collections import deque
>>> from itertools import product
>>> from timeit import timeit
>>> digits = '0123456789'
>>> timeit(lambda: deque(product(digits, repeat=8), 0), number=5)
3.014396679995116
>>> timeit(lambda: deque(product(digits, repeat=9), 0), number=5)
30.99540744899423

If producing all possible combinations of 8 digits takes .8 seconds (4s divided by 5 repetitions), 9 digits takes 8 seconds, you can extrapolate from that that 10 digits takes almost 1.5 minutes, etc. Just producing all possible combinations of 16 digits takes 1 million (10 ** 6) times as much time as 10 digits, so 963 days or just onder 3 years to run those in a loop. You could split this task up across 2000 different processes on a large number of computers with enough cores in total to run those processes in parallel, to reduce that to under 12 hours.

Then the loop body itself takes about 2.4 seconds per million iterations:

>>> from random import choice
>>> combo = tuple(choice(digits) for _ in range(16))  # random combination for testing
>>> timeit("""\
... coords = '{}.{} {}.{}'.format(
...     ''.join(combo[:2]), ''.join(combo[2:8]),
...     ''.join(combo[8:10]), ''.join(combo[10:]))
... hash = hashlib.sha256(coords).hexdigest()
... if hash == '3f1c756daec9ebced7ff403acb10430659c13b328c676c4510773dc315784e4e': pass
... """, 'from __main__ import combo; import hashlib')
2.3429908752441406

But you have 10 ** 10 (10 thousand million) more times work than that, totaling roughly 743 years of computation work. Even being able to run 20 thousand parallel processes won't reduce that to a reasonable number (that's still about 13.5 years of work).

Python is just not fast enough for this task. Using GPUs it should be possible to reach 500 million hashes per second (0.5 Gigahash / s), at which point you could run the above brute-force operation and find a solution in about 230 days on such a system. At a cost, of course, because such a rig would cost about $3000-$4000 a month to run! But with enough dedicated hardware you can certainly 'crack' the hash in 'humane' timeline.

Question 2

One of the common claims about hashes is that they discard information, therefore they cannot be reversed. There's infinite messages that have the same hash. You can't know which of the infinite messages that give the same hash is correct.

Of course in practice, a brute force attack often works - either because your search strategy is likely to find the real original message first (most messages with hash collisions are obviously wrong in some trivial way - e.g. the wrong format - and won't turn up in the search because of that) or because your attack needs a different message with the same hash anyway.

In your case, what you know about the message means there's less information in the message than (apparently) in the hash. Of course hashing doesn't create new information, so that means many hashes cannot occur for any co-ordinate strings. You have (with very high probability for good hash algorithms) a 1:1 relationship between possible hashes and possible messages. In principle, you have an encrypted form of your message which can be decrypted.

Many people would call me an idiot for saying this, of course. After all, you still have to find all the hashes for all the possible messages. That may be faster than some people think, but it's a long way from trivial.

It has already been pointed out that there are 10^16 possible combinations based on your co-ordinate format. One thing to check is whether all values for all those digits are possible (and equally probable). Using floating point arithmetic internally shouldn't be a problem - double precision floats aren't 8-digit decimals, but 53 bits of mantissa should be plenty to ensure all those decimal digits are fully in use. However, it may be worth checking that there's no other limitation that reduces the number of cases to check - the obvious one being the precision in how those co-ordinates are measured.

Even if certain digit values are less probable than others, that means ordering the search to check more likely values first will save a lot of time for crackers.