C#:类解码Quoted-Printable编码?

https://stackoverflow.com/questions/2226554

19-09-2019
|

题

是有一个现有类在C#可以转换 Quoted-Printable 编码 String?点击链接，以获得更多的信息的编码。

以下是引用了上述的链接，为了您的方便。

任何的8位字节的数值可以编码有3个字，"="后二进制数字(0-9或A–F) 代表字节的数值。例如，一个US-ASCII形式料字符的(小数值12)可以表示"=0C"，和一个US-ASCII 等号(小数值61) 表示"=3D"。所有的人物除了printable ASCII characters或底线字符编码在这种时尚。

所有printable ASCII characters (小数值之间33和126) 可能是代表自己，除了"="(小61).

ASCII码标签和空间的人物，小数值9和32，可以代表自己，除了如果这些字出现在该结束的一条线。如果这些人物结束时出现一线，它必须应编码为"=09"(tab)或"=20" (space)。

如果数据正在编码包含有意义的线断裂，他们必须编码作为ASCII CR LF序列，不如他们原来的字节的数值。相反，如果字节的数值13 10 有的含义比其他的底线然后，他们必须编码为=0D和 =0A。

行quoted-printable编码数据必须不得超过76符。为满足这一要求没有改变编码文本，软线中断可能增加预期的作用。柔软的断线是由一个"="在结束一个编码线，并且不因为一条线打破在解码的文本。

解决方案

有在框架库执行此功能，但它似乎没有被干净露出。执行是在内部类System.Net.Mime.QuotedPrintableStream。这个类定义的方法称为DecodeBytes这你想要做什么。该方法似乎是由仅一个用于解码MIME头方法中。这种方法也是内部的，但相当直接在几个地方，例如，在Attachment.Name setter方法调用。演示：

using System;
using System.Net.Mail;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Attachment attachment = Attachment.CreateAttachmentFromString("", "=?iso-8859-1?Q?=A1Hola,_se=F1or!?=");
            Console.WriteLine(attachment.Name);
        }
    }
}

产生输出：

¡HOLA，_señor！

您可能需要做一些测试，以确保回车等被视为正确虽然在快速测试我做，他们似乎是。但是，除非你的用例是足够接近一个MIME头字符串，你不认为它会通过向图书馆所做的任何更改被打破的解码也未必是明智的，依靠此功能。你可能会更好编写自己的引用可打印解码器。

其他提示

我扩展马丁墨菲的溶液和我希望它会在任何情况下工作。

private static string DecodeQuotedPrintables(string input, string charSet)
{           
    if (string.IsNullOrEmpty(charSet))
    {
        var charSetOccurences = new Regex(@"=\?.*\?Q\?", RegexOptions.IgnoreCase);
        var charSetMatches = charSetOccurences.Matches(input);
        foreach (Match match in charSetMatches)
        {
            charSet = match.Groups[0].Value.Replace("=?", "").Replace("?Q?", "");
            input = input.Replace(match.Groups[0].Value, "").Replace("?=", "");
        }
    }

    Encoding enc = new ASCIIEncoding();
    if (!string.IsNullOrEmpty(charSet))
    {
        try
        {
            enc = Encoding.GetEncoding(charSet);
        }
        catch
        {
            enc = new ASCIIEncoding();
        }
    }

    //decode iso-8859-[0-9]
    var occurences = new Regex(@"=[0-9A-Z]{2}", RegexOptions.Multiline);
    var matches = occurences.Matches(input);
    foreach (Match match in matches)
    {
        try
        {
            byte[] b = new byte[] { byte.Parse(match.Groups[0].Value.Substring(1), System.Globalization.NumberStyles.AllowHexSpecifier) };
            char[] hexChar = enc.GetChars(b);
            input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
        }
        catch { }
    }

    //decode base64String (utf-8?B?)
    occurences = new Regex(@"\?utf-8\?B\?.*\?", RegexOptions.IgnoreCase);
    matches = occurences.Matches(input);
    foreach (Match match in matches)
    {
        byte[] b = Convert.FromBase64String(match.Groups[0].Value.Replace("?utf-8?B?", "").Replace("?UTF-8?B?", "").Replace("?", ""));
        string temp = Encoding.UTF8.GetString(b);
        input = input.Replace(match.Groups[0].Value, temp);
    }

    input = input.Replace("=\r\n", "");
    return input;
}

我写这件事真正的快。

    public static string DecodeQuotedPrintables(string input)
    {
        var occurences = new Regex(@"=[0-9A-H]{2}", RegexOptions.Multiline);
        var matches = occurences.Matches(input);
        var uniqueMatches = new HashSet<string>(matches);
        foreach (string match in uniqueMatches)
        {
            char hexChar= (char) Convert.ToInt32(match.Substring(1), 16);
            input =input.Replace(match, hexChar.ToString());
        }
        return input.Replace("=\r\n", "");
    }

如果您是解码引用可打印用UTF-8编码将需要知道你不能解码每个引用可打印序列中的一个，在一次一个仿佛有援引可打印字符的运行的其他人已经表明在一起。

例如 - 如果你有如下序列= E2 = 80 = 99和解码此使用UTF8一在-A-一次得到三个“怪异”字符 - 如果你代替建立的三个字节的数组，并转换与UTF8编码三个字节你得到一个单一的aphostrope。

显然，如果使用的是ASCII编码，然后一在-A-时间是没有问题然而解码装置运行不管你的代码将工作中使用的文本的编码器。

哦，别忘= 3D是，这意味着你需要解码不管你有更多的时间的特殊情况......这是一个疯狂的疑难杂症！

希望帮助

此引用Printable解码器的工作太棒了！

public static byte[] FromHex(byte[] hexData)
    {
        if (hexData == null)
        {
            throw new ArgumentNullException("hexData");
        }

        if (hexData.Length < 2 || (hexData.Length / (double)2 != Math.Floor(hexData.Length / (double)2)))
        {
            throw new Exception("Illegal hex data, hex data must be in two bytes pairs, for example: 0F,FF,A3,... .");
        }

        MemoryStream retVal = new MemoryStream(hexData.Length / 2);
        // Loop hex value pairs
        for (int i = 0; i < hexData.Length; i += 2)
        {
            byte[] hexPairInDecimal = new byte[2];
            // We need to convert hex char to decimal number, for example F = 15
            for (int h = 0; h < 2; h++)
            {
                if (((char)hexData[i + h]) == '0')
                {
                    hexPairInDecimal[h] = 0;
                }
                else if (((char)hexData[i + h]) == '1')
                {
                    hexPairInDecimal[h] = 1;
                }
                else if (((char)hexData[i + h]) == '2')
                {
                    hexPairInDecimal[h] = 2;
                }
                else if (((char)hexData[i + h]) == '3')
                {
                    hexPairInDecimal[h] = 3;
                }
                else if (((char)hexData[i + h]) == '4')
                {
                    hexPairInDecimal[h] = 4;
                }
                else if (((char)hexData[i + h]) == '5')
                {
                    hexPairInDecimal[h] = 5;
                }
                else if (((char)hexData[i + h]) == '6')
                {
                    hexPairInDecimal[h] = 6;
                }
                else if (((char)hexData[i + h]) == '7')
                {
                    hexPairInDecimal[h] = 7;
                }
                else if (((char)hexData[i + h]) == '8')
                {
                    hexPairInDecimal[h] = 8;
                }
                else if (((char)hexData[i + h]) == '9')
                {
                    hexPairInDecimal[h] = 9;
                }
                else if (((char)hexData[i + h]) == 'A' || ((char)hexData[i + h]) == 'a')
                {
                    hexPairInDecimal[h] = 10;
                }
                else if (((char)hexData[i + h]) == 'B' || ((char)hexData[i + h]) == 'b')
                {
                    hexPairInDecimal[h] = 11;
                }
                else if (((char)hexData[i + h]) == 'C' || ((char)hexData[i + h]) == 'c')
                {
                    hexPairInDecimal[h] = 12;
                }
                else if (((char)hexData[i + h]) == 'D' || ((char)hexData[i + h]) == 'd')
                {
                    hexPairInDecimal[h] = 13;
                }
                else if (((char)hexData[i + h]) == 'E' || ((char)hexData[i + h]) == 'e')
                {
                    hexPairInDecimal[h] = 14;
                }
                else if (((char)hexData[i + h]) == 'F' || ((char)hexData[i + h]) == 'f')
                {
                    hexPairInDecimal[h] = 15;
                }
            }

            // Join hex 4 bit(left hex cahr) + 4bit(right hex char) in bytes 8 it
            retVal.WriteByte((byte)((hexPairInDecimal[0] << 4) | hexPairInDecimal[1]));
        }

        return retVal.ToArray();
    }
    public static byte[] QuotedPrintableDecode(byte[] data)
    {
        if (data == null)
        {
            throw new ArgumentNullException("data");
        }

        MemoryStream msRetVal = new MemoryStream();
        MemoryStream msSourceStream = new MemoryStream(data);

        int b = msSourceStream.ReadByte();
        while (b > -1)
        {
            // Encoded 8-bit byte(=XX) or soft line break(=CRLF)
            if (b == '=')
            {
                byte[] buffer = new byte[2];
                int nCount = msSourceStream.Read(buffer, 0, 2);
                if (nCount == 2)
                {
                    // Soft line break, line splitted, just skip CRLF
                    if (buffer[0] == '\r' && buffer[1] == '\n')
                    {
                    }
                    // This must be encoded 8-bit byte
                    else
                    {
                        try
                        {
                            msRetVal.Write(FromHex(buffer), 0, 1);
                        }
                        catch
                        {
                            // Illegal value after =, just leave it as is
                            msRetVal.WriteByte((byte)'=');
                            msRetVal.Write(buffer, 0, 2);
                        }
                    }
                }
                // Illegal =, just leave as it is
                else
                {
                    msRetVal.Write(buffer, 0, nCount);
                }
            }
            // Just write back all other bytes
            else
            {
                msRetVal.WriteByte((byte)b);
            }

            // Read next byte
            b = msSourceStream.ReadByte();
        }

        return msRetVal.ToArray();
    }

    private string quotedprintable(string data, string encoding)
    {
        data = data.Replace("=\r\n", "");
        for (int position = -1; (position = data.IndexOf("=", position + 1)) != -1;)
        {
            string leftpart = data.Substring(0, position);
            System.Collections.ArrayList hex = new System.Collections.ArrayList();
            hex.Add(data.Substring(1 + position, 2));
            while (position + 3 < data.Length && data.Substring(position + 3, 1) == "=")
            {
                position = position + 3;
                hex.Add(data.Substring(1 + position, 2));
            }
            byte[] bytes = new byte[hex.Count];
            for (int i = 0; i < hex.Count; i++)
            {
                bytes[i] = System.Convert.ToByte(new string(((string)hex[i]).ToCharArray()), 16);
            }
            string equivalent = System.Text.Encoding.GetEncoding(encoding).GetString(bytes);
            string rightpart = data.Substring(position + 3);
            data = leftpart + equivalent + rightpart;
        }
        return data;
    }

我是在寻找一个动态解决和花了2天尝试不同的解决方案。这一解决方案将支持日本汉字和其他标准字符集

private static string Decode(string input, string bodycharset) {
        var i = 0;
        var output = new List<byte>();
        while (i < input.Length) {
            if (input[i] == '=' && input[i + 1] == '\r' && input[i + 2] == '\n') {
                //Skip
                i += 3;
            } else if (input[i] == '=') {
                string sHex = input;
                sHex = sHex.Substring(i + 1, 2);
                int hex = Convert.ToInt32(sHex, 16);
                byte b = Convert.ToByte(hex);
                output.Add(b);
                i += 3;
            } else {
                output.Add((byte)input[i]);
                i++;
            }
        }


        if (String.IsNullOrEmpty(bodycharset))
            return Encoding.UTF8.GetString(output.ToArray());
        else {
            if (String.Compare(bodycharset, "ISO-2022-JP", true) == 0)
                return Encoding.GetEncoding("Shift_JIS").GetString(output.ToArray());
            else
                return Encoding.GetEncoding(bodycharset).GetString(output.ToArray());
        }

    }

然后你可以打电话的功能与

Decode("=E3=82=AB=E3=82=B9=E3", "utf-8")

这是最初发现在这里，

这是为我工作的唯一一个。

http://sourceforge.net/apps/trac/syncmldotnet/wiki/引述％20Printable

如果你只需要QP的解码，拉你的代码中从上面的链接，这三项功能：

    HexDecoderEvaluator(Match m)
    HexDecoder(string line)
    Decode(string encodedText)

和然后只是：

var humanReadable = Decode(myQPString);

享受

更好的溶液

    private static string DecodeQuotedPrintables(string input, string charSet)
    {
        try
        {
            enc = Encoding.GetEncoding(CharSet);
        }
        catch
        {
            enc = new UTF8Encoding();
        }

        var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
        var matches = occurences.Matches(input);

    foreach (Match match in matches)
    {
            try
            {
                byte[] b = new byte[match.Groups[0].Value.Length / 3];
                for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
                {
                    b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
                }
                char[] hexChar = enc.GetChars(b);
                input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
        }
            catch
            { ;}
        }
        input = input.Replace("=\r\n", "").Replace("=\n", "").Replace("?=", "");

        return input;
}

public static string DecodeQuotedPrintables(string input, Encoding encoding)
    {
        var regex = new Regex(@"\=(?<Symbol>[0-9A-Z]{2})", RegexOptions.Multiline);
        var matches = regex.Matches(input);
        var bytes = new byte[matches.Count];

        for (var i = 0; i < matches.Count; i++)
        {
            bytes[i] = Convert.ToByte(matches[i].Groups["Symbol"].Value, 16);
        }

        return encoding.GetString(bytes);
    }

有时串入EML文件是由几个编码部分组成。这是使用Dave的对于这些情况的方法的函数：

public string DecodeQP(string codedstring)
{
    Regex codified;

    codified=new Regex(@"=\?((?!\?=).)*\?=", RegexOptions.IgnoreCase);
    MatchCollection setMatches = codified.Matches(cadena);
    if(setMatches.Count > 0)
    {
        Attachment attdecode;
        codedstring= "";
        foreach (Match match in setMatches)
        {
            attdecode = Attachment.CreateAttachmentFromString("", match.Value);
            codedstring+= attdecode.Name;

        }                
    }
    return codedstring;
}

请注意：与“input.Replace”解决方案遍布互联网，他们仍然是不正确的。

看，如果你有 ONE解码符号，然后用 “替换” 所有在“输入”符号将被替换，然后将所有下列解码将被打破。

更多正确的解决方案：

public static string DecodeQuotedPrintable(string input, string charSet) { Encoding enc; try { enc = Encoding.GetEncoding(charSet); } catch { enc = new UTF8Encoding(); } input = input.Replace("=\r\n=", "="); input = input.Replace("=\r\n ", "\r\n "); input = input.Replace("= \r\n", " \r\n"); var occurences = new Regex(@"(=[0-9A-Z]{2})", RegexOptions.Multiline); //{1,} var matches = occurences.Matches(input); foreach (Match match in matches) { try { byte[] b = new byte[match.Groups[0].Value.Length / 3]; for (int i = 0; i < match.Groups[0].Value.Length / 3; i++) { b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier); } char[] hexChar = enc.GetChars(b); input = input.Replace(match.Groups[0].Value, new String(hexChar)); } catch { Console.WriteLine("QP dec err"); } } input = input.Replace("?=", ""); //.Replace("\r\n", ""); return input; }

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow