Detecting CJK characters in a string (C#)

Question 1

use iTextSharp.text.pdf.FontSelector;

iTextSharp.text.pdf.FontSelector selector = new iTextSharp.text.pdf.FontSelector();

// add 2 type of font to FontSelector
selector.AddFont(openSansfont);
selector.AddFont(chinesefont);


iTextSharp.text.Phrase phrase = selector.Process(yourTxt);

FontSelector will use the correct font for you!

Detailed Description from source file FontSelector.cs.

Selects the appropriate fonts that contain the glyphs needed to render text correctly. The fonts are checked in order until the character is found.

I forgot which order it search first!! please experience it!! Edit: the order is from the first addFont to the last addFont.

http://itextpdf.com/examples/iia.php?id=214

Question 2

Just incase anyone stumbles across this question, I've found another solution using the unicode blocks listed here (http://msdn.microsoft.com/en-us/library/20bw873z.aspx#SupportedNamedBlocks) in a regex.

var Name = "Joe Bloggs";
var Regex = new Regex(@"\p{IsCJKUnifiedIdeographs}");

if(Regex.IsMatch(Name))
{
    //switch to CJK font
}
else
{
    //keep calm and carry on
}

EDIT:

You'll probably need to match more than just the Unified Ideographs, try using this as the regex:

string r = 
@"\p{IsHangulJamo}|"+
@"\p{IsCJKRadicalsSupplement}|"+
@"\p{IsCJKSymbolsandPunctuation}|"+
@"\p{IsEnclosedCJKLettersandMonths}|"+
@"\p{IsCJKCompatibility}|"+
@"\p{IsCJKUnifiedIdeographsExtensionA}|"+
@"\p{IsCJKUnifiedIdeographs}|"+
@"\p{IsHangulSyllables}|"+
@"\p{IsCJKCompatibilityForms}";

That works for all the Korean text I tried it on.

Question 3

Well I did edit daves answer to make it work, but apparently only i can see that until its peer reviewed so i will post the solution as my own answer. Basically dave just needs to extend his regex a bit to this:

string regex = 
@"\p{IsHangulJamo}|"+
@"\p{IsCJKRadicalsSupplement}|"+
@"\p{IsCJKSymbolsandPunctuation}|"+
@"\p{IsEnclosedCJKLettersandMonths}|"+
@"\p{IsCJKCompatibility}|"+
@"\p{IsCJKUnifiedIdeographsExtensionA}|"+
@"\p{IsCJKUnifiedIdeographs}|"+
@"\p{IsHangulSyllables}|"+
@"\p{IsCJKCompatibilityForms}";

which will detect Korean characters when used like this:

string subject = "도형이";

Match match = Regex.Match(subject, regex);

if(match.Success)
{
    //change to Korean font
}
else
{
    //keep calm and carry on
{