Domanda

Is there any solution from English unicode to Gujarati unicode? Suppose unicode for a = \u0061 then it will translate અ = u0095

È stato utile?

Soluzione

The Unicode CLDR provides files containing instructions on how to transliterate from Latin to Gujarati. The instructions for transformation are in .XML files using Locale Data Markup Language.

Latin-Gujarati involves:

  1. Filtering the string to characters or character ranges described by

    ['.0-9A-Za-z~À-ÅÇ-ÏÑ-ÖÙ-Ýà-åç-ïñ-öù-ýÿ-ďĒ-ĥĨ-İĴ-ķĹ-ľŃ-ňŌ-őŔ-ťŨ-žƠ-ơƯ-ưǍ-ǜǞ-ǣǦ-ǭǰǴ-ǵǸ-țȞ-ȟȦ-ȳʔ́̃-̄̆-̇̐̔-̣̥̱́̈́̕΅-ΆΈ-ΊΌΎ-ΐά-ΰό-ώϓЃЌЎЙйѓќўӁ-ӂӐ-ӑӖ-ӗӢ-ӣӮ-ӯḀ-ẙẠ-ỹἁἃ-ἅἇἉἋ-ἍἏἑἓ-ἕἙἛ-Ἕἡἣ-ἥἧἩἫ-ἭἯἱἳ-ἵἷἹἻ-ἽἿὁὃ-ὅὉὋ-Ὅὑὓ-ὕὗὙὛὝὟὡὣ-ὥὧὩὫ-ὭὯάέήίόύώᾁᾃ-ᾅᾇᾉᾋ-ᾍᾏᾑᾓ-ᾕᾗᾙᾛ-ᾝᾟᾡᾣ-ᾥᾧᾩᾫ-ᾭᾯ-ᾱᾴᾸ-ᾹΆῄΈΉ῎ῐ-ῑΐῘ-ῙΊ῞ῠ-ῡΰῥῨ-ῩΎ-Ῥ΅ῴΌΏK-Å\uE04D\uE064]

  2. Putting the result of the previous step in Normal Form D

  3. Lowercasing the result of the previous step

  4. Performing Latin-InterIndic transform on the result of the previous step. As you can see from the file, this has already gotten pretty compilcated and I am not going into the details of this step.

  5. Performing InterIndic-Gujarati on the result of previous step. Same note as previous step.

  6. Putting the result of the previous step in Normal Form C

So if we do this for the letter "a", and skip right to step 4, which describes the following relevant transforms:

$wa=\uE005
a→$wa

We have "\uE005" now. Now step 5:

\uE005→અ

So we end up with , and it is unchanged by step 6.


You probably want to look at CLDR Eclipse Setup but I'm not sure if these are just development tools for the cldr maintainers and I actually have no idea if anyone has implemented a library for this in java.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top