Question

I am using iTextSharp to generate a series of PDFs, using Open Sans as the default font. On occasion, names are inserted into the content of the PDFs. However my issue is that some of the names I need to insert contain CJK characters (stored in nvarchar columns in SQL Server), and as far as I know Open Sans does not support CJK characters at present. I need to keep using Open Sans as my default font, so ideally I would like to try and detect CJK characters in the strings being grabbed from the database and switch to a CJK font when printing out those characters.

Would a regex be the best bet for this? I haven't been able to find any regex patterns that would help with this unfortunately.

Thanks in advance for any help!

Was it helpful?

Solution 2

use iTextSharp.text.pdf.FontSelector;

iTextSharp.text.pdf.FontSelector selector = new iTextSharp.text.pdf.FontSelector();

// add 2 type of font to FontSelector
selector.AddFont(openSansfont);
selector.AddFont(chinesefont);


iTextSharp.text.Phrase phrase = selector.Process(yourTxt);

FontSelector will use the correct font for you!

Detailed Description from source file FontSelector.cs.

Selects the appropriate fonts that contain the glyphs needed to render text correctly. The fonts are checked in order until the character is found.

I forgot which order it search first!! please experience it!! Edit: the order is from the first addFont to the last addFont.

http://itextpdf.com/examples/iia.php?id=214

OTHER TIPS

Just incase anyone stumbles across this question, I've found another solution using the unicode blocks listed here (http://msdn.microsoft.com/en-us/library/20bw873z.aspx#SupportedNamedBlocks) in a regex.

var Name = "Joe Bloggs";
var Regex = new Regex(@"\p{IsCJKUnifiedIdeographs}");

if(Regex.IsMatch(Name))
{
    //switch to CJK font
}
else
{
    //keep calm and carry on
}

EDIT:

You'll probably need to match more than just the Unified Ideographs, try using this as the regex:

string r = 
@"\p{IsHangulJamo}|"+
@"\p{IsCJKRadicalsSupplement}|"+
@"\p{IsCJKSymbolsandPunctuation}|"+
@"\p{IsEnclosedCJKLettersandMonths}|"+
@"\p{IsCJKCompatibility}|"+
@"\p{IsCJKUnifiedIdeographsExtensionA}|"+
@"\p{IsCJKUnifiedIdeographs}|"+
@"\p{IsHangulSyllables}|"+
@"\p{IsCJKCompatibilityForms}"; 

That works for all the Korean text I tried it on.

Well I did edit daves answer to make it work, but apparently only i can see that until its peer reviewed so i will post the solution as my own answer. Basically dave just needs to extend his regex a bit to this:

string regex = 
@"\p{IsHangulJamo}|"+
@"\p{IsCJKRadicalsSupplement}|"+
@"\p{IsCJKSymbolsandPunctuation}|"+
@"\p{IsEnclosedCJKLettersandMonths}|"+
@"\p{IsCJKCompatibility}|"+
@"\p{IsCJKUnifiedIdeographsExtensionA}|"+
@"\p{IsCJKUnifiedIdeographs}|"+
@"\p{IsHangulSyllables}|"+
@"\p{IsCJKCompatibilityForms}"; 

which will detect Korean characters when used like this:

string subject = "도형이";

Match match = Regex.Match(subject, regex);

if(match.Success)
{
    //change to Korean font
}
else
{
    //keep calm and carry on
{
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top