Question

First of all, I should stress that I'm trying to learn here, not be malicious or spam anyone.

I'm trying to learn about regex in Google search results by finding email addresses using the following code. However, sometimes it only finds some of the email addresses, other times not at all.

If I try it with a Wikipedia URL then I don't have a problem.

$url = "https://www.google.com/search?q=hello@hotmail.com";
// $url = "http://en.wikipedia.org/wiki/Email_address"; this works fine
$string = file_get_contents($url);

$matches = array();
$pattern = '/[a-z\d._%+-]+@[a-z\d.-]+\.[a-z]{2,4}\b/i';
preg_match_all($pattern,$string,$matches);

foreach ($matches as $row)
{
    foreach ($row as $row2)
    {
        echo $row2."<br>";
    }
}
Was it helpful?

Solution

You're missing uppercase:

'/[A-Za-z\d._%+-]+@[A-Za-z\d.-]+\.[A-Za-z]{2,4}\b/i'

I put it in everywhere in case you want to match HELLO@GMAIL.COM, you can always downcase it.

EDIT: I think I was trying to solve this for a different email address which wasn't being matched

EDIT 2: search the html, those that don't work have emphasis like example<em>@example.com</em> so won't parse.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top