Question

As per my knowledge i know unicode character means every letter has an unique code.

In my database i have set utl8.

Here, i am saving a string (ఉత్తరప్రదేశ్) directly into the database in java.Then it is saved as

ఉత్తరప
్రదేశ్ 

But the same string i saved in database using

escapeUnicode(StringEscapeUtils.unescapeHtml("here string"));


public String escapeUnicode(String input) {
   StringBuilder b = new StringBuilder(input.length());
   Formatter f = new Formatter(b);
    for (char c : input.toCharArray()) {
      if (c < 128) {
        b.append(c);
      } else {
        f.format("\\u%04x", (int) c);
      }
     }
   return b.toString();
}

It is generating unicode as

\u0c09\u0c24\u0c4d\u0c24\u0c30\u0c2a\u0c4d\u0c30\u0c26\u0c47\u0c36\u0c4d

Both are displaying in browser correctly.Why they both are generating different unicodes ? Thanks in advance..

Était-ce utile?

La solution

Those are not different numbers…

  • 3081 = 0c09 = ఉ = TELUGU LETTER U
  • 3108 = 0c24 = త = TELUGU LETTER TA
  • 3149 = 0c4d = ్ = TELUGU SIGN VIRAMA

… and so on.

Two different ways to represent the same Unicode code point.

The first are decimal numbers (base 10). The second are hexadecimal numbers (base 16).

When using a class such as Formatter, sometimes it helps to read the documentation. Then you might understand why you pasted f.format("\\u%04x" into your code.

Tip: If you have a Mac, download the UnicodeChecker app to see both decimal and hex numbers for each character defined in Unicode.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top