Bitwise XOR in Javascript compared to C++

https://stackoverflow.com/questions/11563224

21-06-2021
|

Question

I am porting a simple C++ function to Javascript, but it seems I'm running into problems with the way Javascript handles bitwise operators.

In C++:

AnsiString MyClass::Obfuscate(AnsiString source)
{
    int sourcelength=source.Length();
    for(int i=1;i<=sourcelength;i++)
    {
        source[i] = source[i] ^ 0xFFF;
    }
    return source;
}

Obfuscate("test") yields temporary intvalues

-117, -102, -116, -117

Obfuscate ("test") yields stringvalue

‹šŒ‹

In Javascript:

function obfuscate(str) 
{
    var obfuscated= "";
    for (i=0; i<str.length;i++) {

        var a = str.charCodeAt(i);                 
        var b = a ^ 0xFFF;
        obfuscated= obfuscated+String.fromCharCode(b);
    }
    return obfuscated;
}

obfuscate("test") yields temporary intvalues

3979 , 3994 , 3980 , 3979

obfuscate("test") yields stringvalue

ྋྚྌྋ

Now, I realize that there are a ton of threads where they point out that Javascript treats all numbers as floats, and bitwise operations involve a temporary cast to 32bit int.

It really wouldn't be a problem except for that I'm obfuscating in Javascript and reversing in C++, and the different results don't really match.

How do i tranform the Javascript result into the C++ result? Is there some simple shift available?

Solution

Working demo

Judging from the result that xoring 116 with 0xFFF gives -117, we have to emulate 2's complement 8-bit integers in javascript:

function obfuscate(str) 
{
    var bytes = [];
    for (var i=0; i<str.length;i++) {
        bytes.push( ( ( ( str.charCodeAt(i) ^ 0xFFF ) & 0xFF ) ^ 0x80 ) -0x80 );
    }
    return bytes;
}

Ok these bytes are interpreted in windows cp 1252 and if they are negative, probably just subtracted from 256.

var ascii = [
    0x0000,0x0001,0x0002,0x0003,0x0004,0x0005,0x0006,0x0007,0x0008,0x0009,0x000A,0x000B,0x000C,0x000D,0x000E,0x000F
    ,0x0010,0x0011,0x0012,0x0013,0x0014,0x0015,0x0016,0x0017,0x0018,0x0019,0x001A,0x001B,0x001C,0x001D,0x001E,0x001F
    ,0x0020,0x0021,0x0022,0x0023,0x0024,0x0025,0x0026,0x0027,0x0028,0x0029,0x002A,0x002B,0x002C,0x002D,0x002E,0x002F
    ,0x0030,0x0031,0x0032,0x0033,0x0034,0x0035,0x0036,0x0037,0x0038,0x0039,0x003A,0x003B,0x003C,0x003D,0x003E,0x003F
    ,0x0040,0x0041,0x0042,0x0043,0x0044,0x0045,0x0046,0x0047,0x0048,0x0049,0x004A,0x004B,0x004C,0x004D,0x004E,0x004F
    ,0x0050,0x0051,0x0052,0x0053,0x0054,0x0055,0x0056,0x0057,0x0058,0x0059,0x005A,0x005B,0x005C,0x005D,0x005E,0x005F
    ,0x0060,0x0061,0x0062,0x0063,0x0064,0x0065,0x0066,0x0067,0x0068,0x0069,0x006A,0x006B,0x006C,0x006D,0x006E,0x006F
    ,0x0070,0x0071,0x0072,0x0073,0x0074,0x0075,0x0076,0x0077,0x0078,0x0079,0x007A,0x007B,0x007C,0x007D,0x007E,0x007F
];

var cp1252 = ascii.concat([
    0x20AC,0xFFFD,0x201A,0x0192,0x201E,0x2026,0x2020,0x2021,0x02C6,0x2030,0x0160,0x2039,0x0152,0xFFFD,0x017D,0xFFFD
    ,0xFFFD,0x2018,0x2019,0x201C,0x201D,0x2022,0x2013,0x2014,0x02DC,0x2122,0x0161,0x203A,0x0153,0xFFFD,0x017E,0x0178
    ,0x00A0,0x00A1,0x00A2,0x00A3,0x00A4,0x00A5,0x00A6,0x00A7,0x00A8,0x00A9,0x00AA,0x00AB,0x00AC,0x00AD,0x00AE,0x00AF
    ,0x00B0,0x00B1,0x00B2,0x00B3,0x00B4,0x00B5,0x00B6,0x00B7,0x00B8,0x00B9,0x00BA,0x00BB,0x00BC,0x00BD,0x00BE,0x00BF
    ,0x00C0,0x00C1,0x00C2,0x00C3,0x00C4,0x00C5,0x00C6,0x00C7,0x00C8,0x00C9,0x00CA,0x00CB,0x00CC,0x00CD,0x00CE,0x00CF
    ,0x00D0,0x00D1,0x00D2,0x00D3,0x00D4,0x00D5,0x00D6,0x00D7,0x00D8,0x00D9,0x00DA,0x00DB,0x00DC,0x00DD,0x00DE,0x00DF
    ,0x00E0,0x00E1,0x00E2,0x00E3,0x00E4,0x00E5,0x00E6,0x00E7,0x00E8,0x00E9,0x00EA,0x00EB,0x00EC,0x00ED,0x00EE,0x00EF
    ,0x00F0,0x00F1,0x00F2,0x00F3,0x00F4,0x00F5,0x00F6,0x00F7,0x00F8,0x00F9,0x00FA,0x00FB,0x00FC,0x00FD,0x00FE,0x00FF
]);

function toStringCp1252(bytes){
    var byte, codePoint, codePoints = [];
    for( var i = 0; i < bytes.length; ++i ) {
        byte = bytes[i];
        if( byte < 0 ) {
            byte = 256 + byte;
        }
        codePoint = cp1252[byte];
        codePoints.push( codePoint );

    }

    return String.fromCharCode.apply( String, codePoints );
}

Result

toStringCp1252(obfuscate("test"))
//"‹šŒ‹"

OTHER TIPS

I'm guessing that AnsiString contains 8-bit characters (since the ANSI character set is 8 bits). When you assign the result of the XOR back to the string, it is truncated to 8 bits, and so the resulting value is in the range [-128...127].

(On some platforms, it could be [0..255], and on others the range could be wider, since it's not specified whether char is signed or unsigned, or whether it's 8 bits or larger).

Javascript strings contain unicode characters, which can hold a much wider range of values, the result is not truncated to 8 bits. The result of the XOR will have a range of at least 12 bits, [0...4095], hence the large numbers you see there.

Assuming the original string contains only 8-bit characters, then changing the operation to a ^ 0xff should give the same results in both languages.

I assume that AnsiString is in some form, an array of chars. And this is the problem. in c, char can typically only hold 8-bits. So when you XOR with 0xfff, and store the result in a char, it is the same as XORing with 0xff.

This is not the case with javascript. JavaScript using Unicode. This is demonstrated by looking at the integer values:

-117 == 0x8b and 3979 == 0xf8b

I would recommend XORing with 0xff as this will work in both languages. Or you can switch your c++ code to use Unicode.

First, convert your AnsiString to wchar_t*. Only then obfuscate its individual characters:

AnsiString MyClass::Obfuscate(AnsiString source)
{
   /// allocate string
   int num_wchars = source.WideCharBufSize();
   wchar_t* UnicodeString = new wchar_t[num_wchars];
   source.WideChar(UnicodeString, source.WideCharBufSize());

   /// obfuscate individual characters
   int sourcelength=source.Length();
   for(int i = 0 ; i < num_wchars ; i++)
   {
       UnicodeString[i] = UnicodeString[i] ^ 0xFFF;
   }

   /// create obfuscated AnsiString
   AnsiString result = AnsiString(UnicodeString);

   /// delete tmp string
   delete [] UnicodeString;

   return result;
}

Sorry, I'm not an expert on C++ Builder, but my point is simple: in JavaScript you have WCS2 symbols (or UTF-16), so you have to convert AnsiString to wide chars first.

Try using WideString instead of AnsiString

I don't know AnsiString at all, but my guess is this relates to the width of its characters. Specifically, I suspect they're less than 32 bits wide, and of course in bitwise operations, the width of what you're operating on matters, particularly when dealing with 2's complement numbers.

In JavaScript, your "t" in "test" is character code 116, which is b00000000000000000000000001110100. 0xFFF (4095) is b00000000000000000000111111111111, and the result you're getting (3979) is b00000000000000000000111110001011. We can readily see that you're getting the right result for the XOR:

116  = 00000000000000000000000001110100
4095 = 00000000000000000000111111111111
3979 = 00000000000000000000111110001011

So I'm thinking you're getting some truncation or similar in your C++ code, not least because -117 is b10001011 in eight-bit 2's complement...which is exactly what we see as the last eight bits of 3979 above.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow