Question

How can I check for duplicate email addresses in PHP, with the possibility of Gmail's automated labeler and punctuation in mind?

For example, I want these addressed to be detected as duplicates:

         username@gmail.com
        user.name@gmail.com
   username+label@gmail.com
  user.name+label@gmail.com

Despite what Daniel A. White claims: In Gmail, dots at random places before the '@' (and label) can be placed as much as you like. user.name@gmail.com and username@gmail.com are in fact the same user.

Was it helpful?

Solution

$email_parts    = explode('@', $email);

// check if there is a "+" and return the string before
$before_plus    = strstr($email_parts[0], '+', TRUE);
$before_at      = $before_plus ? $before_plus : $email_parts[0];

// remove "."
$before_at      = str_replace('.', '', $before_at);

$email_clean    = $before_at.'@'.$email_parts[1];

OTHER TIPS

Strip the address to the basic form before comparing. Make a function normalise() that will strip the label, then remove all dots. Then you can compare the addresses via:

normalise(address1) == normalise(address2)

If you have to do it very often, save the addresses in the normalised form too, so you don't have to convert them back too often.

This answer is an improvement on @powtac's answer. I needed this function to defeat multiple signups from same person using gmail.

if ( ! function_exists('normalize_email'))
{
    /**
     * to normalize emails to a base format, especially for gmail
     * @param $email
     * @return string
     */
    function normalize_email($email) {
        // ensure email is lowercase because of pending in_array check, and more...
        $email = strtolower($email);
        $parts    = explode('@', $email);

        // normalize gmail addresses
        if (in_array($parts[1], ['gmail.com', 'googlemail.com'])) {
            // check if there is a "+" and return the string before then remove "."
            $before_plus    = strstr($parts[0], '+', TRUE);
            $before_at      = str_replace('.', '', $before_plus ? $before_plus : $parts[0]);

            // ensure only @gmail.com addresses are used
            $email    = $before_at.'@gmail.com';
        }

        return $email;
    }
}

Perhaps this would be better titled "How to normalize gmail addresses in PHP, considering (user.name+label@gmail.com)"

You have two technical solutions above. I'll go a different route and ask why you're trying to do this. It doesn't feel right to me. Are you trying to prevent someone registering multiple times at your site using different e-mail addresses? This will only prevent a specialized case of that.

I have my own domain, example.com, and any e-mail that goes to any address at that domain goes to my single mailbox. Do you, now, want to put a check to normalize anything at my example.com to a single address on your end?

By the official e-mail address format, those addresses you are trying to match as the same are different.

Email address parsing is really, really hard to do correctly, without breaking things and annoying users..

First, I would question if you really need to do this? Why do you have multiple email addresses, with different sub-addresses?

If you are sure you need to do this, first read rfc0822, then modify this email address parsing regex to extract all parts of the email, and recombine them excluding the label..

Slightly more.. practically, the Email Address wikipedia page has a section on this part of the address format, Sub-addressing.

The code powtac posted looks like it should work - as long as you're not using it in an automated manner to delete accounts or anything, it should be fine.

Note that the "automated labeler" isn't a GMail specific feature, Gmail simply popularised it.. Other mail servers support this feature, some using + as the separator, others using -. If you are going to special-case spaces in GMail addresses, remember to consider the googlemail.com domain also

I have extended Zend Validator like this.

<?php
class My_Validate_EmailAddress extends Zend_Validate_EmailAddress
{
    public function isValid($value)
    {
        $valid = parent::isValid($value);
        if ($valid
                && in_array($this->_hostname, array('gmail.com', 'googlemail.com'))
                && substr_count($this->_localPart, '.') > 1) {
            $this->_error(parent::INVALID_HOSTNAME);
            $valid = false;
        }
        return valid;
    }
}

Email with more than one "dot" symbol in gmail address are considered invalid. For some cases this is not logical solution, but that works for me.

function normalize($input) {
     $input = str_replace('.', '', $input);
     $pattern = '/\+(\w+)@/';
     return preg_replace($pattern, '@', $input);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top