Question

My string contains what could be perceived as regex.

var x = "a&\b";

I am trying to replace out any non-numeric or alphabetical character.

var z = Regex.Replace(x, "[^a-zA-Z0-9 -]", "", RegexOptions.IgnoreCase);

Expected result: ab
Actual result: a

I understand that \b is a regular expression word.

I also understand I can var x = @"a&\b";, however, I wish to escape the variable, not the assignment.

how can I escape my variablex?

I have tried Regex.Escape()

Was it helpful?

Solution

The initial regular expression would work - if the String contained what was expected.

This is because \ in a String Literal (except for a Verbatim String Literal) is the escape character. While this is mentioned in the question, the fundamental premise in the question is wrong and it has nothing to do with "\b is a regular expression word" because the string in question is not used as the regular expression pattern.

Literal  ->  actual String data
"a&\b"       {'a', '&', BELL}
"a&\\b"      {'a', '&', '\', 'b'}
@"a&\b"      {'a', '&', '\', 'b'}

As such, it is the original string which does not contain a 'b' - but rather the BELL character - which is removed due to not being accepted by the original regular expression replacement. BELL is, after all, not an alphanumeric character. Even if it wasn't removed, it would not display as a 'b' character .. because it's BELL.

While there is no generalized way in the .NET standard library1 to reverse-escape from "\b" to "\\b"/@"\b", you may find this transformation function useful - then you could go x = EscapeLikeALiteral("a&\b"), after which x == "a&\\b", and obtain the desired "ab" result, even with the original regular expression2.


1 The Regex.Escape/Regex.Unescape methods are only suitable for use with regular expression patterns and not this generalized task of "reverse escaping strings to literals".

2 Strictly speaking, the original regular expression is not an alphanumeric filter as it also allows spaces and dashes.

OTHER TIPS

Instead of your code how about using \W?

\w Matches any word character [a-zA-Z_0-9]

\W Matches any non-word character [^a-zA-Z_0-9]

So I am suggesting you use:

var z = Regex.Replace(x, "\W", "", RegexOptions.IgnoreCase);

You might be able to use:

var z = Regex.Replace(x, "[^a-zA-Z_0-9]", "", RegexOptions.IgnoreCase);

But I think the first one is nicer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top