Question

After I retrieve messages from mail box I want to separate message body from subject, date and other information. But I can't find wright algorithm. Here is my code:

// create an instance of TcpClient 
TcpClient tcpclient = new TcpClient();

// HOST NAME POP SERVER and gmail uses port number 995 for POP 

tcpclient.Connect("pop.gmail.com", 995);
// This is Secure Stream // opened the connection between client and POP Server
System.Net.Security.SslStream sslstream = new SslStream(tcpclient.GetStream());
// authenticate as client  
sslstream.AuthenticateAsClient("pop.gmail.com");
//bool flag = sslstream.IsAuthenticated;   // check flag
// Asssigned the writer to stream 
System.IO.StreamWriter sw = new StreamWriter(sslstream);
// Assigned reader to stream
System.IO.StreamReader reader = new StreamReader(sslstream);
// refer POP rfc command, there very few around 6-9 command
sw.WriteLine("USER my_login");
// sent to server
sw.Flush();
sw.WriteLine("PASS my_pass");
sw.Flush();
// this will retrive your first email
sw.WriteLine("RETR 1");
sw.Flush();

string str = string.Empty;
string strTemp = string.Empty;
while ((strTemp = reader.ReadLine()) != null)
{
    // find the . character in line
    if (strTemp == ".")
    {
        break;
    }
    if (strTemp.IndexOf("-ERR") != -1)
    {
        break;
    }
    str += strTemp;
}

// close the connection
sw.WriteLine("Quit ");
sw.Flush();

richTextBox2.Text = str;

I have to extract:

  • The subject of message
  • The author
  • The date
  • The message body

Can anyone tell me how to do this?

String which I receive (str) contains the subject Test message and the body This is the text of test message. It looks like:

+OK Gpop ready for requests from 46.55.3.85 s42mb37199022eev+OK send PASS+OK Welcome.+OK message followsReturn-Path: Received: from TMD-I31S3H51L29 (host-static-46-55-3-85.moldtelecom.md. [46.55.3.85]) by mx.google.com with ESMTPSA id o5sm61119999eeg.8.2014.04.16.13.48.20
for (version=TLSv1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 16 Apr 2014 13:48:21 -0700 (PDT)Message-ID: <534eec95.856b0e0a.55e1.6612@mx.google.com>MIME-Version: 1.0From: mail_address@gmail.comTo: mail_address@gmail.comDate: Wed, 16 Apr 2014 13:48:21 -0700 (PDT)Subject: Test messageContent-Type: text/plain; charset=us-asciiContent-Transfer-Encoding: quoted-printableThis is the text of test message

Thank you very much!

Was it helpful?

Solution

What you first need to do is read rfc1939 to get an idea of the POP3 protocol. But immediately after reading that, you'll need to read the following list of RFCs... actually, screw it, I'm not going to paste the long list of them here, I'll just link you to the website of my MimeKit library which already has a fairly comprehensible list of them.

As your original code correctly did, it needs to keep reading from the socket until the termination sequence (".\r\n") is encountered, thus terminating the message stream.

The way you are doing it is really inefficient, but whatever, it'll (mostly) work except for the fact that you need to undo any/all byte-stuffing that is done by the POP3 server to munge lines beginning with a period ('.'). For more details, read the POP3 specification I linked above.

To parse the headers, you'll need to read rfc822. Suffice it to say, Olivier's approach will fall flat on its face, most likely the second it tries to 'split' any real-world messages... unless it gets extremely lucky.

As a hint, the message body is separated from the headers by a blank line.

Here's a few other problems you are likely to eventually run into:

  1. Header values are supposed to be encoded if they contain non-ASCII text (see rfc2047 and rfc2231 for details).
  2. Some header values in the wild are not properly encoded, and sometimes, even though they are not supposed to, include undeclared 8-bit text. Dealing with this is non-trivial. This also means that you cannot really use a StreamReader to read lines as you'll lose the original byte sequences.
  3. If you actually want to do anything with the body of the message, you'll have to write a MIME parser.

I'd highly recommend using MimeKit and my other library, MailKit, for POP3 support.

Trust me, you are in for a world of pain trying to do this the way you are trying to do it.

OTHER TIPS

String.Split is not powerful enough for this task. You wiil have to use Regex. The pattern that I suggest is:

^(?<name>\w+): (?<value>.*?)$

The meaning is:

^                    Beginning of line (if you use the multiline option).
(?<name>pattern)   Capturing group where the group name is "name".
\w+                  A word.
.*?                  Any sequence of characters (for the value)
$                    End of line

This code ...

MatchCollection matches = 
    Regex.Matches(text, @"^(?<name>\w+): (?<value>.*?)$", RegexOptions.Multiline);
foreach (Match match in matches) {
    Console.WriteLine("{0} = {1}", 
        match.Groups["name"].Value, 
        match.Groups["value"].Value
    );
}

... produces this output:

Received = from TMD-I31S3H51L29 (host-static-46-55-3-85.m ...
From = mail_address@gmail.com
To = mail_address@gmail.com
Date = Wed, 16 Apr 2014 13:48:21 -0700 (PDT)
Subject = Test message

The body seems to be start after the "Content-Transfer-Encoding:" line and goes to the end of the string. You can find the body like this:

Match body = 
    Regex.Match(text, @"^Content-Transfer-Encoding: .*?$", RegexOptions.Multiline);
if (body.Success) {
    Console.WriteLine(text.Substring(body.Index + body.Length + 1));
}

In case the lines are separated by LineFeeds only the RegexOptions.Multiline might not works. Then you would have to replace the beginning and end of line symbols (^ and $) by \n in the regex expressions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top