문제

Writing a file utility to strip out all non-ASCII characters from files. I have this Regex:

Regex rgx = new Regex(@"[^\u0000-\u007F]");

Which works fine. But unfortunatly, I've discovered some silly people use right angles (¬) as delimiters in their files, so these get stripped out as well, but I need those!

I'm pretty new to Regex, and I do understand the basics, but any help would be awesome!

Thanks in advance!

도움이 되었습니까?

해결책

You just need to include the code point for the angle bracket in the set:

Try this:

Regex rgx = new Regex(@"[^\uxxxx\u0000-\u007F]");

Or this:

Regex rgx = new Regex(@"[^\uxxxx-\uxxxx\u0000-\u007F]");

(Where xxxx is the Unicode code point for the character you want to preserve.)

The reason for giving two options here is that I know you can specify multiple ranges within one negative character group, but I don't know if you can match individual characters with ranges.

다른 팁

Jon's answer is absolutely correct. You may be using the wrong code for the character. Try the following for the similar looking characters:

Regex regex = new Regex(@"([^\u00ac\u0000-\u007F])");
Regex regex = new Regex(@"([^\u02fa\u0000-\u007F])");
Regex regex = new Regex(@"([^\u031a\u0000-\u007F])");

First one should work I think.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top