Embedding binary data in web page?

Question 1

Let's consider two solutions.

Base 32

For fun, let's consider using base-32 numbers. Yes, you can do that in JavaScript.

First pack four 7-bit values into one integer:

function pack(a1,a2,a3,a4){
    return ((a1 << 8 | a2) << 8 | a3) << 8 | a4;
}

Now, convert to base 32.

function encode(n){
    var str = "000000" + n.toString(32);
    str = str.slice(0,6);
    return str;
}

That should be not more than six digits. We make sure it's exactly six.

Going the other direction:

function decode(s){
    return parseInt(s, 32);
}

function unpack(x){
    var a1 = x & 0xff0000>>24, a2 = x & 0x00ff0000>>16, a3 = x & 0x0000ff00>>8, a4 = x & 0x000000ff;
    return [a1, a2, a3, a4];
}

All that remains is to wrap the logic around this to handle the 6000 elements. To compress:

function compress(elts){
    var str = '';
    for(var i = 0; i < elts.length; i+=4){
        str += encode(pack(elts[i], elts[i+1], elts[i+2], elts[i+3])
    }
    return str;
}

And to uncompress:

function uncompress(str){
    var elts = [];
    for(var i = 0; i < str.length; i+=6){
        elts = elts.concat(unpack(decode(str.slice(i, i+6)));
    }
    return elts;
}

If you concatenate the results for all 6,000 elements, you'll have 1,500 packed numbers, which at six characters each will turn into about 9K. It's about 1.5 bytes per 7-bit value. It's by no means the information-theoretic maximum compression, but it's not that bad. To decode simply reverse the process:

Unicode

First we'll pack two 7-bit values into one integer:

function pack(a1,a2){
    return (a1 << 8 | a2) << 8;
}

We'll do this for all 6,000 inputs, then use our friend String.fromCharCode to turn all 3,000 values into a 3,000-character Unicode string:

function compress(elts){
    var packeds = [];
    for (var i = 0; i < elts.length; i+=2) {
        packeds.push(pack(elts[i], elts[i+1]);
    }
    return String.fromCharCode.apply(0, packeds);
}

Coming back the other way, it's quite easy:

function uncompress(str) {
    var elts = [], code;
    for (var i = 0; i < str.length; i++) {
        code=str.charCodeAt(i);
        elts.push(code>>8, code & 0xff);
    }
    return elts;
}

This will take up two bytes per two 7-bit values, so about 33% more efficient than the base 32 approach.

If the above string is going to be written out into a script tag as a Javascript assignment such as var data="HUGE UNICODE STRING";, then quotation marks in string will need to be escaped:

javascript_assignment = 'var data = "' + compress(elts).replace(/"/g,'\\"') + '";';

The above code is not meant to be production, and in particular does not handle edge cases where the the number of inputs is not a multiple of four or two.

Question 2

As dandavis said it is ok to encode unprintable ASCII characters into JSON-string. But for random data it gave me 13KB (because many characters must be escaped). You can encode string into base64 and then into JSON-string. It gave me 7.9KB for random data.

var randint = function (from, to) {
    return Math.floor(Math.random() * (to - from + 1)) + from;
}

var data = '';
for (var i = 0; i < 6000; ++i) {
    data += String.fromCharCode(randint(0, 127));
}
// encoding `data` as JSON-string at this point gave me 13KB

var b64data = btoa(data);
// encoding `b64data` as JSON-string gave me 7.9KB

to decode it

var data = atob(b64data);
var adata = [];
for (var i = 0; i < data.length; ++i) {
    adata.push(data.charCodeAt(i));
}

There definitely should be more efficient method to encode your data, but I believe this one is a compromise on complexity and efficiency. PS. In some browsers you might need to write atob and btoa by yourself.

Question 3

actually, strings work fine if you use JSON to encode any potential nasties into a JS-escape code:

var codes=",Ñkqëgdß\u001f", // (10 chars JSON encoded to store all chars ranges)
mySet=codes[4].charCodeAt().toString(2).split("").map(Number).map(Boolean).reverse();

alert(mySet); // shows: [true,false,false,false,true,true,true] 


/*  broken down into bite-sized steps: (pseudo code)
char == "g" (codes[4])
"g".charCodeAt() == 103
(103).toString(2) == "1100111"
.split().map(Number) ==  [1,1,0,0,1,1,1]
.map(Boolean).reverse() == [true,true,true,false,false,true,true]  */

and to fill the array, reverse the process:

var toStore= [true, false, true, false, true, false, true];
var char= String.fromCharCode(parseInt(toStore.map(Number).reverse().join(""),2));
codes+=char;

//verify (should===true):   
codes[10].charCodeAt().toString(2).split("")
   .map(Number).map(Boolean).reverse().toString() === toStore.toString();

to export the results to an ascii file, JSON.stringify(codes), or if saving to localStrorage, you can just save the raw string variable since browsers use two bytes per char of localStorage...