The most recent RFC for email on the internet is RFC 5322 and it specifically addresses addresses.
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
The dot-atom is a highly restricted set of characters defined in the spec. However, the quoted-string
is where you can run into trouble. It's not often used, but in terms of the possibility that you'll run into it, you could well get something in quotation marks that could itself contain an @
character.
However, if you split the string from the last @
, you should safely have located the local-part
and the domain
, which is well defined in the specification in terms of how you can verify it.
The problem comes with punycode, whereby almost any Unicode character can be mapped into a valid DNS name. If the system you are front-ending can understand and interpret punycode, then you have to handle almost anything that has valid unicode characters in it. If you know you're not going to work with punycode, then you can use a more restricted set, generally letters, digits, and the hyphen character.
To quote the late, great Jon Postel: TCP implementations should follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others.
Side note on the local part: Keeping in mind, of course, that there are probably lots of systems on the internet that don't require strict adherence to the specs and therefore might allow things outside of the spec to work due to the long standing liberal-acceptance/conservative-transmission philosophy.