Replacing all non-ASCII characters, except right angle character in C#
-
10-10-2019 - |
문제
Writing a file utility to strip out all non-ASCII characters from files. I have this Regex:
Regex rgx = new Regex(@"[^\u0000-\u007F]");
Which works fine. But unfortunatly, I've discovered some silly people use right angles (¬) as delimiters in their files, so these get stripped out as well, but I need those!
I'm pretty new to Regex, and I do understand the basics, but any help would be awesome!
Thanks in advance!
해결책
You just need to include the code point for the angle bracket in the set:
Try this:
Regex rgx = new Regex(@"[^\uxxxx\u0000-\u007F]");
Or this:
Regex rgx = new Regex(@"[^\uxxxx-\uxxxx\u0000-\u007F]");
(Where xxxx is the Unicode code point for the character you want to preserve.)
The reason for giving two options here is that I know you can specify multiple ranges within one negative character group, but I don't know if you can match individual characters with ranges.
다른 팁
Jon's answer is absolutely correct. You may be using the wrong code for the character. Try the following for the similar looking characters:
Regex regex = new Regex(@"([^\u00ac\u0000-\u007F])");
Regex regex = new Regex(@"([^\u02fa\u0000-\u007F])");
Regex regex = new Regex(@"([^\u031a\u0000-\u007F])");
First one should work I think.