سؤال

I am trying to validate emails (UTF8) using the following regular expression

Regex.IsMatch(emailAddress, @"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$", RegexOptions.CultureInvariant);

It returns false for "äpfel@domain.com".

Any suggestions on how to improve it.

هل كانت مفيدة؟

المحلول

  1. UTF-8 has nothing to do with this, you're validating a string, not a particular encoding thereof.

  2. Your Regex actually returns true for "äpfel@domain.com" (with or without the CultureInvariant option). Try Console.Write(Regex.IsMatch("äpfel@domain.com", @"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$", RegexOptions.CultureInvariant)); on its own, and you get true.

  3. You will fail on all IDNs like info@ουτοπία.δπθ.gr and if you care about non ASCiI-restricted email addresses you may want to include them. (And if you want to exclude prohibited confusables, you're getting really complicated).

There are the problems stated by others with using regular expressions to validate emails, but they boil down to:

  1. The actual email syntax is more complicated than people think (even before we deal with the non-ASCII extensions). e.g. did you know that Abc\@def@example.com is a valid email address? It is, in fact it's an example of a valid address given in RFC 3696.

  2. If you go to the effort of building a perfect validator (it is possible), it'll be a waste of effort. Chances are your email software won't handle them all (e.g. Abc\@def@example.com above won't work with a lot of software) an then lots of valid email addresses won't actually be correct.

But anyway, I get true running your code, the bug is elsewhere.

نصائح أخرى

The simple answer is that you don't want to do this: regular expressions are a horrible way of validating email addresses.

The answer to your specific question is that, if you are willing to block valid addresses and permit invalid ones, you want to use [\p{L}\p{M}\p{N}] rather than \w to match Unicode word characters in the username part of the address.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top