Figuring out how to decode obfuscated URL parameters

https://stackoverflow.com//questions/9566736

06-12-2019
|

Question

I have web based system that uses encrypted GET parameters. I need to figure out what encryption is used and create a PHP function to recreate it. Any ideas?

Example URL: ...&watermark=ISpQICAK&width=IypcOysK&height=IypcLykK&...

Solution

You haven't provided nearly enough sample data for us to reliably guess even the alphabet used to encode it, much less what structure it might have.

What I can tell, from the three sample values you've provided, is:

There is quite a lot of redundancy in the data — compare e.g. width=IypcOysK and height=IypcLykK (and even watermark=ISpQICAK, though that might be just coincidence). This suggests that the data is neither random nor securely encrypted (which would make it look random).
The alphabet contains a fairly broad range of upper- and lowercase letters, from A to S and from c to y. Assuming that the alphabet consists of contiguous letter ranges, that means a palette of between 42 and 52 possible letters. Of course, we can't tell with any certainty from the samples whether other characters might also be used, so we can't even entirely rule out Base64.
This is not the output of PHP's base_convert function, as I first guessed it might be: that function only handles bases up to 36, and doesn't output uppercase letters.

That, however, is just about all. It would help to see some more data samples, ideally with the plaintext values they correspond to.

Edit: The id parameters you give in the comments are definitely in Base64. Besides the distinctive trailing = signs, they both decode to simple strings of nine printable ASCII characters followed by a line feed (hex 0A):

_Base64___________Hex____________________________ASCII_____
JiJQPjNfT0MtCg==  26 22 50 3e 33 5f 4f 43 2d 0a  &"P>3_OC-.
JikwPClUPENICg==  26 29 30 3c 29 54 3c 43 48 0a  &)0<)T<CH.

(I've replaced non-printable characters with a . in the ASCII column above.) On the assumption that all the other parameters are Base64 too, let's see what they decode to:

_Base64___Hex________________ASCII_
ISpQICAK  21 2a 50 20 20 0a  !*P  .
IypcOysK  23 2a 5c 3b 2b 0a  #*\;+.
IypcLykK  23 2a 5c 2f 29 0a  #*\/).

ISNAICAK  21 23 40 20 20 0a  !#@  .
IyNAPjIK  23 23 40 3e 32 0a  ##@>2.
IyNAKjAK  23 23 40 2a 30 0a  ##@*0.

ISggICAK  21 28 20 20 20 0a  !(   .
IikwICAK  22 29 30 20 20 0a  ")0  .
IilAPCAK  22 29 40 3c 20 0a  ")@< .

So there's definitely another encoding layer involved, but we can already see some patterns:

All decoded values consist of a constant number of printable ASCII characters followed by a trailing line feed character. This cannot be a coincidence.
Most of the characters are on the low end of the printable ASCII range (hex 20 – 7E). In particular, the lowest printable ASCII character, space = hex 20, is particularly common, especially in the watermark strings.
The strings in each URL resemble each other more than they resemble the corresponding strings from other URLs. (But there are resemblances between URLs too: for example, all the decoded watermark values begin with ! = hex 21.)

In fact, the highest numbered character that occurs in any of the strings is _ = hex 5F, while the lowest (excluding the line feeds) is space = hex 20. Their difference is hex 3F = decimal 63. Coincidence? I think not. I'll guess that the second encoding layer is similar to uuencoding: the data is split into 6-bit groups (as in Base64), and each group is mapped to an ASCII character simply by adding hex 20 to it.

In fact, it looks like the second layer might be uuencoding: the first bytes of each string have the right values to be uuencode length indicators. Let's see what we get if we try to decode them:

_Base64___________UUEnc______Hex________________ASCII___re-UUE____
JiJQPjNfT0MtCg==  &"P>3_OC-  0b 07 93 fe f8 cd  ......  &"P>3_OC-
JikwPClUPENICg==  &)0<)T<CH  25 07 09 d1 c8 e8  %.....  &)0<)T<CH

_Base64___UUEnc__Hex_______ASC__re-UUE____
ISpQICAK  !*P    2b        +    !*P``
IypcOysK  #*\;+  2b c6 cb  +..  #*\;+
IypcLykK  #*\/)  2b c3 c9  +..  #*\/)

ISNAICAK  !#@    0e        .    !#@``
IyNAPjIK  ##@>2  0e 07 92  ...  ##@>2
IyNAKjAK  ##@*0  0e 02 90  ...  ##@*0

ISggICAK  !(     20             !(```
IikwICAK  ")0    25 00     %.   ")0``
IilAPCAK  ")@<   26 07     &.   ")@<`

This is looking good:

Uudecoding and re-encoding the data (using Perl's unpack "u" and pack "u") produces the original string, except that trailing spaces are replaced with ` characters (which falls within acceptable variation between encoders).
The decoded strings are no longer printable ASCII, which suggests that we might be closer to the real data.
The watermark strings are now single characters. In two cases out of three, they're prefixes of the corresponding width and height strings. (In the third case, which looks a bit different, the watermark might perhaps have been added to the other values.)

One more piece of the puzzle — comparing the ID strings and corresponding numeric values you give in the comments, we see that:

The numbers all have six digits. The first two digits of each number are the same.
The uudecoded strings all have six bytes. The first two bytes of each string are the same.

Coincidence? Again, I think not. Let's see what we get if we write the numbers out as ASCII strings, and XOR them with the uudecoded strings:

_Num_____ASCII_hex___________UUDecoded_ID________XOR______________
406747   34 30 36 37 34 37   25 07 09 d1 c8 e8   11 37 3f e6 fc df
405174   34 30 35 31 37 34   25 07 0a d7 cb eb   11 37 3f e6 fc df
405273   34 30 35 32 37 33   25 07 0a d4 cb ec   11 37 3f e6 fc df

What is this 11 37 3f e6 fc df string? I have no idea — it's mostly not printable ASCII — but XORing the uudecoded ID with it yields the corresponding ID number in three cases out of three.

More to think about: you've provided two different ID strings for the value 405174: JiJQPjNfT0MtCg== and JikwPCpVXE9LCg==. These decode to 0b 07 93 fe f8 cd and 25 07 0a d7 cb eb respectively, and their XOR is 2e 00 99 29 33 26. The two URLs from which these ID strings came from have decoded watermarks of 0e and 20 respectively, which accounts for the first byte (and the second byte is the same in both, anyway). Where the differences in the remaining four bytes come from is still a mystery to me.

OTHER TIPS

That's going to be difficult. Even if you find the encryption method and keys, the original data is likely salted and the salt is probably varied with each record.

That's the point of encryption.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow