Domanda

Ho un distacco webhook ad un modulo sul mio applicazione Web e ho bisogno di analizzare gli indirizzi e-mail di intestazione.

Ecco il testo sorgente:

Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: "Lastname, Firstname" <firstname_lastname@domain.com>
To: <testto@domain.com>, testto1@domain.com, testto2@domain.com
Cc: <testcc@domain.com>, test3@domain.com
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]

sto cercando di tirare fuori il seguente:

<testto@domain.com>, testto1@domain.com, testto2@domain.com

io sono stato alle prese con Regex tutto il giorno senza alcuna fortuna.

È stato utile?

Soluzione

Contrary to some of the posts here I have to agree with mmutz, you cannot parse emails with a regex... see this article:

http://tools.ietf.org/html/rfc2822#section-3.4.1

3.4.1. Addr-spec specification

An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain.

The idea of "locally interpreted" means that only the receiving server is expected to be able to parse it.

If I were going to try and solve this I would find the "To" line contents, break it apart and attempt to parse each segment with System.Net.Mail.MailAddress.

    static void Main()
    {
        string input = @"Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: ""Lastname, Firstname"" <firstname_lastname@domain.com>
To: <testto@domain.com>, ""Yes, this is valid""@[emails are hard to parse!], testto1@domain.com, testto2@domain.com
Cc: <testcc@domain.com>, test3@domain.com
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]";

        Regex toline = new Regex(@"(?im-:^To\s*:\s*(?<to>.*)$)");
        string to = toline.Match(input).Groups["to"].Value;

        int from = 0;
        int pos = 0;
        int found;
        string test;

        while(from < to.Length)
        {
            found = (found = to.IndexOf(',', from)) > 0 ? found : to.Length;
            from = found + 1;
            test = to.Substring(pos, found - pos);

            try
            {
                System.Net.Mail.MailAddress addy = new System.Net.Mail.MailAddress(test.Trim());
                Console.WriteLine(addy.Address);
                pos = found + 1;
            }
            catch (FormatException)
            {
            }
        }
    }

Output from the above program:

testto@domain.com
"Yes, this is valid"@[emails are hard to parse!]
testto1@domain.com
testto2@domain.com

Altri suggerimenti

The RFC 2822-compliant email regex is:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Just run it over your text and you'll get the email addresses.

Of course, there's always the option of not using regex where regex isn't the best option. But up to you!

You cannot use regular expressions to parse RFC2822 mails, because their grammar contains a recursive production (off the top of my head, it was for comments (a (nested) comment)) which makes the grammar non-regular. Regular expressions (as the name suggests) can only parse regular grammars.

See also RegEx match open tags except XHTML self-contained tags for more information.

As Blindy suggests, sometimes you can just parse it out the old-fashioned way.

If you prefer to do that, here is a quick approach assuming the email header text is called 'header':

int start = header.IndexOf("To: ");
int end = header.IndexOf("Cc: ");
string x = header.Substring(start, end-start);

I may be off by a byte on the subtraction but you can very easily test and modify this. Of course you will also have to be certain you always will have a Cc: row in your header or this won't work.

There's a breakdown of validating emails with regex here, which references a more practical implementation of RFC 2822 with:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

It also looks like you only want the email addresses out of the "To" field, and you've got the <> to worry about as well, so something like the following would likely work:

^To: ((?:\<?[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\>?,?(?:\s*))*)

Again, as others having mentioned, you might not want to do this. But if you want regex that will turn that input into <testto@domain.com>, testto1@domain.com, testto2@domain.com, that'll do it.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top