Question

I'm trying to get the email addresses from a file using egrep -o -e and having trouble with addresses at the end of a line.

Here is my regex:

egrep -o -e "[._a-zA-Z0-9]+@[._a-zA-Z0-9]+.[._a-zA-Z0-9]+" ~/myfile.txt

I realize this will not catch every variation of an email address, but if the address is at the end of a line this is what I get:

user@_12345@myemail.com\ul

So I figured I'd try a negative lookahead, but I have no idea how to properly use it. I've read a few things online but I'm confused by how it works.

This is what I've tried:

egrep -o -e "(?!\\[._a-zA-Z0-9]+@[._a-zA-Z0-9]+.[._a-zA-Z0-9]+)" ~/myfile.txt

Bash fails with event not found: \\[._a

Any suggestions?

Was it helpful?

Solution 2

What does the dot stand for?

"[._a-zA-Z0-9]+@[._a-zA-Z0-9]+.[._a-zA-Z0-9]+"
                              ^
                             here

It matches the at-sign. If you remove it, your original regex with no lookahead will work.

Moreover, ! is a special character in bash (history expansion). You have to backslash it to use it literally.

OTHER TIPS

The ! is being interpolated as a history expansion command in bash. You should use single quotes rather than double quotes to prevent this.

However you should note that negative lookahead may not be supported by your version of grep either. In this case you need a more powerful regex tool like perl or ack.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top