Question

I have a program which reads an array of bytes. Those bytes are supposed to be ISO-8859-2 decimal codes of characters. My test array has two elements: 103 which is letter g and 179 which is letter ł (l with tail). I then create a Blob object from it and check its content using two methods:

  1. FileReader
  2. objectURL

The first method gives correct results but the second method gives an extra character in the saved blob file.

Here is the code:

var bytes = [103, 179];
var chr1 = String.fromCharCode(bytes[0]);
var chr2 = String.fromCharCode(bytes[1]);
var str = '';
str += chr1;
str += chr2;
console.log(str.charCodeAt(0)); //103
console.log(str.charCodeAt(1)); //179
console.log(str.charCodeAt(2)); //NaN

var blob = new Blob([str]);
console.log(blob.size); //3

//Checking Blob contents using first method - FileReader
var reader = new FileReader();
reader.addEventListener("loadend", function() {
    var str1 = this.result;
    console.log(str1); //g³
    console.log(str1.charCodeAt(0)); //103
    console.log(str1.charCodeAt(1)); //179
    console.log(str1.charCodeAt(2)); //NaN
});
reader.readAsText(blob);

//Checking Blob contents using second method - objectURL
var url = URL.createObjectURL(blob);
$('<a>',{
    text: 'Download the blob',
    title: 'Download',
    href: url

}).appendTo('#my');

In order to use the second method I created a fiddle. In the fiddle, when you click the "Download" link and save and then open the file in a binary editor, it consists of the following bytes: 103, 194, 179.

My question is, where does the 194 come from and how to create a blob file (using the createobjectURL method) containing only bytes given in the original array ([103, 179] in this case).

Was it helpful?

Solution

The extra 194 comes from an encoding issue :

179 is the unicode code point of "SUPERCRIPT THREE" so the string str will contains "g³". After creating the blob, you will get this string encoded in utf8 : 0x67 for g, 0xC2 0xB3 for ³ (194, 179 in decimal) and it takes 3 bytes. Of course, if you use a FileReader, you will get back 2 characters, "g³".

To avoid that situation (and if you don't want to put everything in utf8), you can use a typed array to construct the blob :

var u8 = new Uint8Array(bytes);
var blob = new Blob([u8]);

That way, you will keep exactly the bytes you want.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top