sed might work better. You can use a regex to remove the patterns that you don't want:
sed -e "s|.*<||" -e "s|>.*||" your_file.txt > new_file.txt
Domanda
this is a similar question to some that are already out there, but couldn't find one that answered my question specifically, so thank you for any assistance/insight.
So I have a text file that I've opened in TextWrangler (popular Mac text editor) with email names and addresses. sample records:
Timmy Turner <tturner@example.com>
"jamminjeff@example.com" <jamminjeff@example.com>
Susan Alder <suesblues@example.com>,
sallyartist@example.com
So some email addresses with names preceding them, most emails enclosed by <> brackets, and some emails just by themselves, already correct, and some with commas after. I want to do a global process that will automate the process of getting this end result, either via Grep or something similar:
tturner@example.com
jamminjeff@example.com
suesblues@example.com
sallyartist@example.com
Thanks for any insight!
Nessuna soluzione corretta
Altri suggerimenti
sed might work better. You can use a regex to remove the patterns that you don't want:
sed -e "s|.*<||" -e "s|>.*||" your_file.txt > new_file.txt
TL;DR
Search:
^.*<?\b([a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@((?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\b>?.*$
Replace:
\1@\2
Explanation:
According to this article, the RFC 5322 specification gives an official definition for a valid email address.
Their string, simplified for use in TextWrangler, would be:
Search:
([a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@((?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Replace:
\1@\2
By itself, it would match:
Timmy Turner <tturner@example.com>
"jamminjeff@example.com" <jamminjeff@example.com>
Susan Alder <suesblues@example.com>,
sallyartist@example.com
While that DOES match your example email strings, it doesn't give you the exact result you want, since it's also including "jamminjeff@example.com"
, which should be stripped out.
You can use some filtering before and after it, if you know a few things:
If yes to 1 and 2, and no to 3, prepend that string with ^.*<?\b
, and append it with \b>?.*$
.
This starts at the beginning of the line, searches for 0 or more characters, an optional opening bracket, and then a word boundary that starts the actual email address.
Then afterward, look for the word boundary on the last character of the email address, an optional closing bracket, and zero or more characters till the end of the line.
Replacing that with \1@\2
will clean up the entire line to only contain the email address.