Domanda

Be prepared, this is one of those hard questions.

In Farsi or Persian language ی which sounds like y or i and is written in 4 different shapes according to it's place in word. I'll call ی as YA from now for simplification.

take a look at this image enter image description here

All YA characters are painted in red, in the first word YA is attached to it's previous (right , in Farsi we right from RIGHT to LEFT) character and is free at the end whereas the last YA (3rd word, left-most red char) is free both from left or right.

Having said this long story, I want to find out if a part of a string ends with long YA (YA without points) or short YA (YA with two points beneath it).

i.e تحصیلداری (the 3rd word) ends with long YA but تحصیـ which is a part of 3rd word does not ends with short YA.

Question: How can I say تحصیلداری ends whit which unicode? I just have a simple string, "تحصیلداری", how can I convert its characters to unicode?

I tried the unicodes

string unicodes = "";
foreach (char c in "تحصیلداری")
{
    unicodes += c+" "+((int)c).ToString() + Environment.NewLine;
}
MessageBox.Show(unicodes);

result :
enter image description here

but at the end of the day unfortunately all YAs have the same unicode.


Bad news : YA was an example, a real one though. There are also a dozen of other characters like YA with different appearances too.


Additional info :
using this useful link about unicodes I found unicode of different YAs

enter image description here

È stato utile?

Soluzione

We solved similar problem the way bellow:

We had a core banking application, the customer sub-system needed a full text search on customers name, family, father name etc.
Different encoding, legacy migrated data, keyboard layouts and Farsi fonts ... made search process inaccurate.

We overcame the problem by replacing problematic characters with some standard one and saving the standard string for search purpose.
After several iterations, the replacement is as bellow that may come in handy:

Formula="UPPER(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(FirsName || LastName || FatherName,
 chr(32),''),
 chr(13),''),
 chr(9),''),
 chr(10),''),
 '-',''),
 '-',''),
 'آ','ا'),
 'أ', 'ا'),
 'ئ', 'ي'),
 'ي', 'ي'),
 'ك', 'ک'),
 'آإئؤةي','اايوهي'),
 'ء',''),
 'شأل','شاال'),
 'ا.','اله'),
 '.',''),
 'الله','اله'),
 'ؤ','و'),
 'إ','ا'),
 'ة','ه'),
 ' ا لله','اله'),
 'ا لله','اله'),
 ' ا لله','اله'))"

Altri suggerimenti

Despite there are different YEHs in Unicode, it must noticed that all presentation forms of YEHs are same Unicode character with code 0x06cc. You can not determine presentation forms by their Unicode code.

But you can reach your goal be checking to see what characters is before or after YEH.

You can also use Fardis to see Unicode codes of strings.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top