Question

While searching for regular expressions used for email address validation, i came across this page: http://www.regular-expressions.info/email.html. i couldn't understand it.

it says: \b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+.)+[A-Z]{2,4}\b will match john@server.department.company.com but not john@aol...com.

Can you explain how (?:[A-Z0-9-]+\.) works in detail and how it doesn't match john@aol...com and matches the other one?

Was it helpful?

Solution

That's because the appearance of a . is only once, so multiple . will not be matched. For .. or ... etc to be matched, it would have to be \.+ (the + means once or more, and is the same as {1,}

The regex says (?:[A-Z0-9-]+\.)+ so it is one or more alphanumeric (or underscore), with a dot, and this whole thing can repeat once or more, so c.c.c. will match, but c..c.c. will not.

The (?: ) is non-capturing, and is usually faster. You can use ( ) and it works as well, but just slower and the matched text will go into the capturing group.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top