Different unicodes for same string in java and mysql

https://stackoverflow.com/questions/19378327

30-06-2022
|

Question

As per my knowledge i know unicode character means every letter has an unique code.

In my database i have set utl8.

Here, i am saving a string (ఉత్తరప్రదేశ్) directly into the database in java.Then it is saved as

&#3081;&#3108;&#3149;&#3108;&#3120;&#3114;
&#3149;&#3120;&#3110;&#3143;&#3126;&#3149;

But the same string i saved in database using

escapeUnicode(StringEscapeUtils.unescapeHtml("here string"));


public String escapeUnicode(String input) {
   StringBuilder b = new StringBuilder(input.length());
   Formatter f = new Formatter(b);
    for (char c : input.toCharArray()) {
      if (c < 128) {
        b.append(c);
      } else {
        f.format("\\u%04x", (int) c);
      }
     }
   return b.toString();
}

It is generating unicode as

\u0c09\u0c24\u0c4d\u0c24\u0c30\u0c2a\u0c4d\u0c30\u0c26\u0c47\u0c36\u0c4d

Both are displaying in browser correctly.Why they both are generating different unicodes ? Thanks in advance..

La solution

Those are not different numbers…

3081 = 0c09 = ఉ = TELUGU LETTER U
3108 = 0c24 = త = TELUGU LETTER TA
3149 = 0c4d = ్ = TELUGU SIGN VIRAMA

… and so on.

Two different ways to represent the same Unicode code point.

The first are decimal numbers (base 10). The second are hexadecimal numbers (base 16).

When using a class such as Formatter, sometimes it helps to read the documentation. Then you might understand why you pasted f.format("\\u%04x" into your code.

Tip: If you have a Mac, download the UnicodeChecker app to see both decimal and hex numbers for each character defined in Unicode.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow