Domanda

I have this string in java:

"test.message"

byte[] bytes = plaintext.getBytes("UTF-8");
//result: [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]

If I do the same thing in javascript:

    stringToByteArray: function (str) {         
        str = unescape(encodeURIComponent(str));

        var bytes = new Array(str.length);
        for (var i = 0; i < str.length; ++i)
            bytes[i] = str.charCodeAt(i);

        return bytes;
    },

I get:

 [7,163,140,72,178,72,244,241,149,43,67,124]

I was under the impression that the unescape(encodeURIComponent()) would correctly translate the string to UTF-8. Is this not the case?

Reference:

http://ecmanaut.blogspot.be/2006/07/encoding-decoding-utf8-in-javascript.html

È stato utile?

Soluzione 2

JavaScript has no concept of character encoding for String, everything is in UTF-16. Most of time time the value of a char in UTF-16 matches UTF-8, so you can forget it's any different.

There are more optimal ways to do this but

function s(x) {return x.charCodeAt(0);}
"test.message".split('').map(s);
// [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]

So what is unescape(encodeURIComponent(str)) doing? Let's look at each individually,

  1. encodeURIComponent is converting every character in str which is illegal or has a meaning in URI Syntax into a URI escaped version so that there is no problem using it as a key or value in the search component of a URI, for example encodeURIComponent('&='); // "%26%3D" Notice how this is now a 6 character long String.
  2. unescape is actually depreciated, but it does a similar job to decodeURI or decodeURIComponent (the reverse of encodeURIComponent). If we look in the ES5 spec we can see 11. Let c be the character whose code unit value is the integer represented by the four hexadecimal digits at positions k+2, k+3, k+4, and k+5 within Result(1).
    So, 4 digits is 2 bytes is "UTF-8", however as I mentioned, all Strings are UTF-16, so it's really a UTF-16 string limiting itself to UTF-8.

Altri suggerimenti

You can use TextEncoder which is part of the Encoding Living Standard. According to the Encoding API entry from the Chromium Dashboard, it shipped in Firefox and will ship in Chrome 38. There is also a text-encoding polyfill available.

The JavaScript code sample below returns a Uint8Array filled with the values you expect.

var s = "test.message";
var encoder = new TextEncoder();
encoder.encode(s);
// [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top