Decoding Base64 / Quoted Printable encoded UTF8 string
-
29-04-2021 - |
Question
In my ASP.Net application working process, I need to do some work with string, which equals something like
=?utf-8?B?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=
How can I decode it to normal human language?
Thanks in advance!
Update:
Convert.FromBase64String()
does not work for string, which equals
=?UTF-8?Q?Bestellbest=C3=A4tigung?=
I get The format of s is invalid. s contains a non-base-64 character, more than two padding characters, or a non-white space-character among the padding characters.
exception.
Update:
Update:
What kind of string encoding is that: Nweiß
???
Solution
This seems to be MIME Header Encoding. The Q in your second example indicates that it is Quoted Printable.
This question seems to cover the variants fairly well. In a quick search I didn't find any .NET libraries to decode this automatically, but it shouldn't be hard to do manually if you need to.
OTHER TIPS
It's actually a base-64 string:
string zz = "SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=";
byte[] dd = Convert.FromBase64String(zz);
// Returns Ihre Bestellung - Versandbestätigung - 1105891246
string yy = System.Text.Encoding.UTF8.GetString(dd);
I've written a library that will decode these sorts of strings. You can find it at http://github.com/jstedfast/MimeKit
Specifically, take a look at MimeKit.Utils.Rfc2047.DecodeText()
That's not UTF8. Thats a Base64 encoded string.
the UTF-8 only indicates that the target string is in UTF8 format. After decoding the Base64 string:
SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=
You'll get the following result:
Ihre Bestellung - Versandbestätigung - 1105891246
Looks like a base64 string.
Try Convert.FromBase64String
http://msdn.microsoft.com/en-us/library/system.convert.frombase64string.aspx
This is an encoded word, which is used in email headers when there is non-ASCII content. Encoded words are defined in RFC 2047:
http://tools.ietf.org/html/rfc2047#section-2
The BNF for an encoded word is:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
So the correct way to interpret this is:
- The data is the stuff between the 3rd and 4th question marks
- It has been Base64 encoded (the 'B' stands for Base64; if it were a 'Q' then it would be quoted-printable).
- Once you decode the data, it will be in the UTF-8 character set.
The result, as @Shai correctly pointed out, is:
Ihre Bestellung - Versandbestätigung - 1105891246
This is German. The umlaut is obviously the reason for the UTF-8 and thus the need for an encoded word. The translation is:
Your order - Delivery confirmation - 1105891246
Apparently it's a tracking number for an order.
All modern email clients (and Outlook) transparently support encoded words.
This is a bit of guesswork, but let's try
- remove
=?
from start and?=
from end - keep the start up to the next
?
as the character set - Remove the
B?
- don't know, what it is - Convert the rest to a
byte[]
viaSystem.Convert.FromBase64String()
- Convert this to the final String via
Encoding.GetSTring()
using the character set remembered in the second step