Your string:
"Unicode surrogate here-> \u1F4F1<--here"
does not do what you think it does.
A char
is basically a UTF-16 code unit, therefore 16 bits. So what happens here is that you have \u1f41 1
; and that explains your output.
I don't know what you call "escape" here, but if this is replacing surrogate pairs by "\u\u", then have a look at Character.toChars()
. It will return the char
sequence necessary to represent one Unicode code point, whether it is in the BMP (one char) or not (two chars).
For code point U+1f4f1, it will return a two-element char array with characters 0xd83d and 0xdcf1 in that order. And this is what you want.