Frage

I'm reading an email file where the first line in the file (so first line in the header) is:

X-RCPT-TO-LIST: 1,2,3

I'm loading it using CDO and ADODB like this:

        ADODB.Stream stream = new ADODB.Stream();
        stream.Open(Type.Missing, ADODB.ConnectModeEnum.adModeUnknown, ADODB.StreamOpenOptionsEnum.adOpenStreamUnspecified, String.Empty, string.Empty);
        stream.LoadFromFile(filename);
        stream.Flush();
        CDO.Message msg = new CDO.Message();
        msg.DataSource.OpenObject(stream, "_Stream");
        msg.DataSource.Save();

Then I'm trying to get the field like this:

ADODB.Field f = msg.Fields["urn:schemas:httpmail:X-RCPT-TO-LIST"];

Which does not work, it returns an empty field (null values).

Looking at the fields in the debugger, I see that the field name is:

urn:schemas:mailheader:ÿþx-rcpt-to-list

I assume my code might work if I look for those weird characters, but I'm worried they might change from one email to the next. Any ideas why those strange characters are added? Is there a better way to access custom header fields (without reading the file myself and parsing it)?

I'm running this test on Windows XP with all of the latest patches (SP3 I think).

Sorry if I tagged this wrong, I had trouble finding tags for this. I'm using C# if not obvious.

Here is the entire email file, I removed some junk (some for privacy reasons) but I did retest with this exact version and getting same results:

X-RCPT-TO-LIST: 1,2,3
Received: by mail-ia0-f172.google.com with SMTP id l29so4135896iag.3
        for <423a777e2af27f463b801fe2eb2242cbdf1d934000000001@users.domain.com>; Fri, 22 Mar 2013 19:52:00 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.50.195.134 with SMTP id ie6mr6320542igc.6.1364007120542;
 Fri, 22 Mar 2013 19:52:00 -0700 (PDT)
Received: by 10.50.169.39 with HTTP; Fri, 22 Mar 2013 19:52:00 -0700 (PDT)
Date: Fri, 22 Mar 2013 19:52:00 -0700
Message-ID: <XXXXXXXX63pPLB9QYu=04W3mU3Ynhkjf2bdYYZqv5oVvQ__u1vg@mail.gmail.com>
Subject: test4
From: <xxxxx2003@gmail.com>
To: 423a777e2af27f463b801fe2eb2242cbdf1d934000000001 <423a777e2af27f463b801fe2eb2242cbdf1d934000000001@users.domain.com>
Content-Type: multipart/alternative; boundary=14dae9340b45e63f6204d88ea7fa

--14dae9340b45e63f6204d88ea7fa
Content-Type: text/plain; charset=UTF-8

test4

-- 
xxxxxx@gmail.com
I don't check *this account* very often

--14dae9340b45e63f6204d88ea7fa
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">test4<br clear=3D"all"><div><br>-- <br><div><a href=3D"mai=
lto:xxxxx@gmail.com" target=3D"_blank">xxxxx@gmail.com</a></div>
<div>I don&#39;t check <b>this account</b> very often</div>
<div>=C2=A0</div>
</div></div>

--14dae9340b45e63f6204d88ea7fa--

The X-RCPT-TO-LIST line is added by code in my email server that translates the RCPT TO:<> lines to internal user IDs. That way my thread that processes these files later knows where to place the mail. I don't want to keep the info in a separate file or anything like that, as I like my current design, I just want to know why CDO/ADODB is translating my message header in to some weird name, like a mix-match of Unicode vs ASCII or something goofy.

War es hilfreich?

Lösung

"ÿþ" as first symbols of a text stream are so-called "byte order mark" most of the time. See eg. Wikipedia entry. They appear in a stream because they are in a file being read. BOM must show up if one opens a file with a hex-editor and checks its first bytes. For instance, "ÿþ" is a text representation of 0xFFFE.

Why are these symbols there in a file in the first place? It depends on how the file was created. This question may appear helpful: Can I export excel data with UTF-8 without BOM?.

Andere Tipps

Unless someone has a better answer, like maybe my code for loading the message has a bug in it, then I'm going to accept this as the answer...

It appears to be a bug in CDO or ADODB that does this for the first line of any message. I tested by removing my X-RCPT-TO-LIST line, so that the first line was a standard "Received:" line, and in that case the Received line had the weird characters added to the name. I also tested with several other emails with different items as the first line, and in all cases the first line always had the weird characters added to the name. I can only imagine the bug has either been fixed (I'm using XP which is pretty old), or most people using CDO haven't noticed because they don't do anything with the Received: lines and that is usually the first line in the header.

For me, to avoid the issue, I will just add an extra line to the top, so I'll have:

X-CDO-FIX: fix X-RCPT-TO-LIST: 1,2,3 ...normal header here...

Tested and works, so I'm happy. Will leave this open for a few days in case someone can provide more info that is worthy of the bounty I have started that might help someone else as well.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top