Question

I was checking the results of a security contest involving XSS (link) and found some wonderful and scary JS XSS payloas. The winner (@kinugawamasato) used a javascript compression technique that seems completely other worldly to me:

Compressed payload:

https://cure53.de/xmas2013/?xss=<scriPt>document.write(unescape(escape(location)
.replace(/u(..)/g,'$1%')))<\/scriPt>㱯扪散琠楤㵥⁣污獳楤㵣汳楤㨳㌳䌷䉃㐭㐶うⴱㅄ〭
䉃〴ⴰ〸ぃ㜰㔵䄸㌠潮牯睥湴敲㵡汥牴⠯繷⸪ℱ⼮數散⡲散潲摳整⠰⤩⤾㱳癧湬潡搽攮摡瑡畲氽慬汛攮
牯睤敬業㴳㍝⬧㽳慮瑡㵀Ⅱ汬潷彤潭慩湳㴧⭤潭慩渻攮捨慲獥琽❵瑦ⴷ✾

What really happened:

<object id=e classid=clsid:333C7BC4-460F-11D0-BC04-0080C7055A83 onrowenter=alert(/~w.*!1/.exec(recordset(0)))><svg onload=e.dataurl=all[e.rowdelim=33]+'?santa=@!allow_domains='+domain;e.charset='utf-7'>

Is this technique already documented somewhere so I can study it? How exacly this thing works? Is there already some javascript compressor that does that in an automated way? How would a WAF react to such a payload like that?

You can see more examples here.

Was it helpful?

Solution

I am using lz-string library for JS compression whenever placing any data into localStorage. I am just a user of this library - not the compression expert. But this is information which could be found around that tool...

The lz-string Goal:

lz-string was designed to fulfill the need of storing large amounts of data in localStorage, specifically on mobile devices. localStorage being usually limited to 5MB, all you can compress is that much more data you can store.

... I (note: "I" means, Pieroxy, author of the lz-string) started out from an LZW implementation (no more patents on that), which is very simple...

So, the fundament, the base of this implemntation is LZW, which is mentioned here Javascript client-data compression by Andy E. Let me point out

An extract from Wikipedia - Algorithm:

The scenario described by Welch's 1984 encodes sequences of 8-bit data as fixed-length 12-bit codes. The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence for which there is no code yet in the dictionary. The code for the sequence (without that character) is added to the output, and a new code (for the sequence with that character) is added to the dictionary.

Wikipedia - Encoding:

A high level view of the encoding algorithm is shown here:

  1. Initialize the dictionary to contain all strings of length one.
  2. Find the longest string W in the dictionary that matches the current input.
  3. Emit the dictionary index for W to output and remove W from the input.
  4. Add W followed by the next symbol in the input to the dictionary.
  5. Go to Step 2.

How it works in case of the lz-string we can observer here:

Let me cite few steps from the already mentioned lz-string source:

What I did was:

  • localStorage can only contain JavaScript strings. Strings in JavaScript are stored internally in UTF-16, meaning every character weight 16 bits. I modified the implementation to work with a 16bit-wide token space.
  • I had to remove the default dictionary initialization, totally useless on a 16bit-wide token space.
  • I initialize the dictionary with three tokens:
    • An entry that produces a 16-bit token.
    • An entry that produces an 8-bit token, because most of what I will store is in the iso-latin-1 space, meaning tokens below 256.
    • An entry that mark the end of the stream.
  • The output is processed by a bit stream that stores effectively 16 bits per character in the output string.
  • Each token is stored with just as many bits that are needed according to the size of the dictionary. Hence, the first token takes 2 bits, the second to 7th three bits, etc....

Well, now we know, that by these compression techniques we get 16 bits information. We can test it in this demo: http://pieroxy.net/blog/pages/lz-string/demo.html (or/and another here)

Which converts the: Hello, world. into

85 04 36 30 f6 60 40 03 0e 04 01 e9 80 39 03 26
00 a2

So we need the final step, let me cite again:

Well, this lib produces stuff that isn't really a string. By using all 16 bits of the UTF-16 bitspace, those strings aren't exactly valid UTF-16. By version 1.3.0, I added two helper encoders to produce stuff that we can do something with:

  • compress produces invalid UTF-16 strings. Those can be stored in localStorage only on webkit browsers (Tested on Android, Chrome, Safari). Can be decompressed with decompress

to continue our example, the Hello, world. would be converted into

҅〶惶̀Ў㦀☃ꈀ

And that's finally it. We can see, that the set of all the ...other then latin characters... comes from the final conversion into UTF-16. Hope, this will give some hints...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top