Question

This is a cry for help from all you cryptologists out there.

Scenario: I have a Windows application (likely built with VC++ or VB and subsequently moved to .Net) that saves some passwords in an XML file. Given a password A0123456789abcDEFGH, the resulting "encrypted" value is 04077040940409304092040910409004089040880408704086040850404504044040430407404073040720407104070

Looking at the string, I've figured out that this is just character shifting: '04' delimits actual character values, which are decimal; if I then subtract these values from 142, I get back the original ASCII code. In Jython (2.2), my decryption routine looks like this (EDITED thanks to suggestions in comments):

blocks = [ pwd[i:i+5] for i in range(0, len(pwd), 5) ] 
# now a block looks like '04093'
decrypted = [ chr( 142 - int(block[3:].lstrip('0')) ) for block in blocks ]

This is fine for ASCII values (127 in total) and a handful of accented letters, but 8-bit charsets have another 128 characters; limiting accepted values to 142 doesn't make sense from a decimal perspective.

EDIT: I've gone rummaging through our systems and found three non-ASCII chars:

è 03910
Ø 03926
Õ 03929

From these values, it looks like actually subtracting the 4-number block from 4142 (leaving only '0' as separator) gives me the correct character.

So my question is:

  • is anybody familiar with this sort of obfuscation scheme in the Windows world? Could this be the product of a standard library function? I'm not very familiar with Win32 and .Net development, to be honest, so I might be missing something very simple.

  • If it's not a library function, can you think of a better method to de-obfuscate these values without resorting to the magic 142 number, i.e. a scheme that can actually be applied on non-ASCII characters without special-casing them? I'm crap at bit shifting and all that, so again I might be missing something obvious to the trained eye.

Was it helpful?

Solution

is anybody familiar with this sort of obfuscation scheme in the Windows world?

Once you understand it correctly, it's just a trivial rotation cipher like ROT13.

Why would anyone use this?

Well, in general, this is very common. Let's say you have some data that you need to obfuscate. But the decryption algorithm and key have to be embedded in software that the viewers have. There's no point using something fancy like AES, because someone can always just dig the algorithm and key out of your code instead of cracking AES. An encryption scheme that's even marginally harder to crack than finding the hidden key is just as good as a perfect encryption scheme—that is, good enough to deter casual viewers, and useless against serious attackers. (Often you aren't even really worried about stopping attacks, but about proving after the fact that your attacker must have acted in bad faith for contractual/legal reasons.) So, you use either a simple rotation cipher, or a simple xor cipher—it's fast, it's hard to get wrong and easy to debug, and if worst comes to worst you can even decrypt it manually to recover corrupted data.

As for the particulars:

If you want to handle non-ASCII characters, you pretty much have to use Unicode. If you used some fixed 8-bit charset, or the local system's OEM charset, you wouldn't be able to handle passwords from other machines.

A Python script would almost certainly handle Unicode characters, because in Python you either deal in bytes in a str, or Unicode characters in a unicode. But a Windows C or .NET app would be much more likely to use UTF-16, because Windows native APIs deal in UTF-16-LE code points in a WCHAR * (aka a string of 16-bit words).

So, why 4142? Well, it really doesn't matter what the key is. I'm guessing some programmer suggested 42. His manager then said "That doesn't sound very secure." He sighed and said, "I already explained why no key is going to be any more secure than… you know what, forget it, what about 4142?" The manager said, "Ooh, that sounds like a really secure number!" So that's why 4142.


If it's not a library function, can you think of a better method to de-obfuscate these values without resorting to the magic 142 number.

You do need to resort to the magic 4142, but you can make this a lot simpler:

def decrypt(block):
    return struct.pack('>H', (4142 - int(block, 10)) % 65536)

So, each block of 5 characters is the decimal representation of a UTF-16 code unit, subtracted from 4142, using C unsigned-short wraparound rules.

This would be trivial to implement in native Windows C, but it's slightly harder in Python. The best transformation function I can come up with is:

def decrypt_block(block):
    return struct.pack('>H', (4142 - int(block, 10)) % 65536)

def decrypt(pwd):
    blocks = [pwd[i:i+5] for i in range(0, len(pwd), 5)] 
    return ''.join(map(decrypt_block, blocks)).decode('utf-16-be')

This would be a lot more trivial in C or C#, which is probably what they implemented things in, so let me explain what I'm doing.

You already know how to transform the string into a sequence of 5-character blocks.

My int(block, 10) is doing the same thing as your int(block.lstrip('0')), making sure that a '0' prefix doesn't make Python treat it as an octal numeral instead of decimal, but more explicitly. I don't think this is actually necessary in Jython 2.2 (it definitely isn't in more modern Python/Jython), but I left it just in case.

Next, in C, you'd just do unsigned short x = 4142U - y;, which would automatically underflow appropriately. Python doesn't have unsigned short values, just signed int, so we have to do the underflow manually. (Because Python uses floored division and remainder, the sign is always the same as the divisor—this wouldn't be true in C, at least not C99 and most platforms' C89.)

Then, in C, we'd just cast the unsigned short to a 16-bit "wide character"; Python doesn't have any way to do that, so we have to use struct.pack. (Note that I'm converting it to big-endian, because I think that makes this easier to debug; in C you'd convert to native-endian, and since this is Windows, that would be little-endian.)

So, now we've got a sequence of 2-character UTF-16-BE code points. I just join them into one big string, then decode it as UTF-16-BE.


If you really want to test that I've got this right, you'll need to find characters that aren't just non-ASCII, but non-Western. In particular, you need:

  • A character that's > U+4142 but < U+10000. Most CJK ideographs, like U+7000 (瀀), fit the bill. This should appear as '41006', because that's 4142-0x7000 rolled over as an unsigned short.
  • A character that's >= U+10000. This includes uncommon CJK characters, specialized mathematical characters, characters from ancient scripts, etc. For example, the Old Italic character U+10300 (𐌀) encodes to the surrogate pair (0xd800, 0xdf00); 4142-0xd800=14382, and 4142-0xdf00=12590, so you'd get '1438212590'.

The first will be hard to find—even most Chinese- and Japanese-native programmers I've dealt with use ASCII passwords. And the second, even more so; nobody but a historical linguistics professor is likely to even think of using archaic scripts in their passwords. By Murphy's Law, if you write the correct code, it will never be used, but if you don't, it's guaranteed to show up as soon as you ship your code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top