Regex Pattern - Allow alpha numeric, a bunch of special chars, but not a certain sequence of chars

StackOverflow https://stackoverflow.com/questions/429719

  •  06-07-2019
  •  | 
  •  

Question

I have the following regex:

(?!^[&#]*$)^([A-Za-z0-9-'.,&@:?!()$#/\\]*)$

So allow A-Z, a-Z, 0-9, and these special chars '.,&@:?!()$#/\

I want to NOT match if the following set of chars is encountered anywhere in the string in this order:

&#

When I run this regex with just "&#" as input, it does not match my pattern, I get an error, great. When I run the regex with '.,&@:?!()$#/\ABC123 It does match my pattern, no errors.

However when I run it with:

'.,&#@:?!()$#/\ABC123

It does not error either. I'm doing something wrong with the check for the &# sequence.

Can someone tell me what I've done wrong, I'm not great with these things.

Was it helpful?

Solution

Borrowing a technique for matching quoted strings, remove & from your character class, add an alternative for & not followed by #, and allow the string to optionally end with &:

^((?:[A-Za-z0-9-'.,@:?!()$#/\\]+|&[^#])*&?)$

OTHER TIPS

I would actually do it in two parts:

  1. Check your allowed character set. To do this I would look for characters that are not allowed, and return false if there's a match. That means I have a nice simple expression:
    [^A-Za-z0-9'\.&@:?!()$#^]
  2. Check your banned substring. And since it is just a substring, I probably wouldn't even use a regex for that part.

You didn't mention your language, but if in C#:

bool IsValid(string input)
{
    return !(   input.Contains("&#")  
               || Regex.IsMatch(@"[^A-Za-z0-9'\.&@:?!()$#^]", input) 
            );
}

^((?!&#)[A-Za-z0-9-'.,&@:?!()$#/\\])*$

note that the last \ is escaped (doubled) SO automatically turns \\ into \ if not in backticks

Assuming Perl compatible RegExp

To not match on the string '&#':

(?![^&]*&#)^([A-Za-z0-9-'.,&@:?!()$#/\\]*)$

Although you don't need the parenthesis because you are matching the entire string.

Just FYI, although Ben Blank's regex works, it's more complicated than it needs to be. I would do it like this:

^(?:[A-Za-z0-9-'.,@:?!()$#/\\]+|&(?!#))+$

Because I used a negative lookahead instead of a negated character class, the regex doesn't need any extra help to match an ampersand at the end of the string.

I'd recommend using two regular expressions in a conditional:

    if (string has sequence "&#")
      return false
    else
      return (string matches sequence "A-Za-z0-9-'.,&@:?!()$#/\")

I believe your second "main" regex of

^([A-Za-z0-9-'.,&@:?!()$#/\])$"

has several errors:

  • It will test only one character in your set
  • The \ character in regular expressions is a token indicating that the next character is part of some sort of "class" of characters (ex. \n = is the line feed character). The character sequence \] is actually causing your bracketed list not to be terminated.

You may be better off using

^[A-Za-z0-9-'.,&@:?!()$#/\\]+$

Note that the slash character is represented by a double-slash.

The + character indicates that at least one character being tested has to match the regex; if it is fine to pass a zero-length string, replace the + with a *.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top