Regex Pattern - Allow alpha numeric, a bunch of special chars, but not a certain sequence of chars
-
06-07-2019 - |
Question
I have the following regex:
(?!^[&#]*$)^([A-Za-z0-9-'.,&@:?!()$#/\\]*)$
So allow A-Z, a-Z, 0-9, and these special chars '.,&@:?!()$#/\
I want to NOT match if the following set of chars is encountered anywhere in the string in this order:
&#
When I run this regex with just "&#" as input, it does not match my pattern, I get an error, great. When I run the regex with '.,&@:?!()$#/\ABC123
It does match my pattern, no errors.
However when I run it with:
'.,&#@:?!()$#/\ABC123
It does not error either. I'm doing something wrong with the check for the &# sequence.
Can someone tell me what I've done wrong, I'm not great with these things.
Solution
Borrowing a technique for matching quoted strings, remove &
from your character class, add an alternative for &
not followed by #
, and allow the string to optionally end with &
:
^((?:[A-Za-z0-9-'.,@:?!()$#/\\]+|&[^#])*&?)$
OTHER TIPS
I would actually do it in two parts:
- Check your allowed character set. To do this I would look for characters that are not allowed, and return false if there's a match. That means I have a nice simple expression:
[^A-Za-z0-9'\.&@:?!()$#^]
- Check your banned substring. And since it is just a substring, I probably wouldn't even use a regex for that part.
You didn't mention your language, but if in C#:
bool IsValid(string input)
{
return !( input.Contains("&#")
|| Regex.IsMatch(@"[^A-Za-z0-9'\.&@:?!()$#^]", input)
);
}
^((?!&#)[A-Za-z0-9-'.,&@:?!()$#/\\])*$
note that the last \ is escaped (doubled)
SO automatically turns \\
into \ if not in backticks
Assuming Perl compatible RegExp
To not match on the string '&#':
(?![^&]*&#)^([A-Za-z0-9-'.,&@:?!()$#/\\]*)$
Although you don't need the parenthesis because you are matching the entire string.
Just FYI, although Ben Blank's regex works, it's more complicated than it needs to be. I would do it like this:
^(?:[A-Za-z0-9-'.,@:?!()$#/\\]+|&(?!#))+$
Because I used a negative lookahead instead of a negated character class, the regex doesn't need any extra help to match an ampersand at the end of the string.
I'd recommend using two regular expressions in a conditional:
if (string has sequence "&#")
return false
else
return (string matches sequence "A-Za-z0-9-'.,&@:?!()$#/\")
I believe your second "main" regex of
^([A-Za-z0-9-'.,&@:?!()$#/\])$"
has several errors:
- It will test only one character in your set
- The
\
character in regular expressions is a token indicating that the next character is part of some sort of "class" of characters (ex.\n
= is the line feed character). The character sequence\]
is actually causing your bracketed list not to be terminated.
You may be better off using
^[A-Za-z0-9-'.,&@:?!()$#/\\]+$
Note that the slash character is represented by a double-slash.
The +
character indicates that at least one character being tested has to match the regex; if it is fine to pass a zero-length string, replace the +
with a *
.