I was able to use a Regex that matched all non basic latin, and convert to RTF unicode escape sequences.
const string RTFSpecialsInUTF = @"(\P{IsBasicLatin})";
private static Regex UTFSpecialRegex = new Regex(RTFSpecialsInUTF, RegexOptions.Compiled);
private static string ReplaceDirect(Match match) {
int codepoint = (int)Convert.ToChar(match.Groups[1].Value);
if (!(codepoint < 32768)) {
codepoint = codepoint - 65536;
}
return string.Format("\\u{0}?", codepoint);
}
/* Usage */
value = UTFSpecialRegex.Replace(value, new MatchEvaluator(PDFDocumentRTF.ReplaceDirect));
Keeping my fingers crossed that this will work for other languages that don't fit into Basic Latin and RTF very well (like Arabic).