How to read string from HttpRequest form data in correct encoding

https://stackoverflow.com/questions/18615519

27-06-2022
|

سؤال

Today I have done a service to receive emails from SendGrid and finally have sent an email with a text "At long last", first time in non-English language during testing. Unfortunately, the encoding has become a problem that I cannot fix.

In a ServiceStack service I have a string property (in an input object that is posted to the service from SendGrid) in an encoding that is different from UTF8 or Unicode (KOI8-R in my case).

public class SengGridEmail : IReturn<SengGridEmailResponse>
    {
        public string Text { get; set; }
    }

When I try to convert this string to UTF8 I get ????s, probably because when I access the Text property it is already converted into Unicode (.NET's internal string representation). This question and answer illustrate the issue.

My question is how to get original KOI8-R bytes within ServiceStack service or ASP.NEt MVC controller, so that I could convert it to UTF8 text?

Update:

Accessing base.Request.FormData["text"] doesn't help

var originalEncoding = Encoding.GetEncoding("KOI8-R");
var originalBytes = originalEncoding.GetBytes(base.Request.FormData["text"]);

But if I take base64 string from the original sent mail and convert it to byte[], and then convert those bytes to UTF8 string - it works. Either base.Request.FormData["text"] is already in Unicode .NET string format, or (less likely) it is something on SendGrid side.

Update 2: Here is a unit test that shows what is happening:

[Test]
public void EncodingTest()
{
    const string originalString = "наконец-то\r\n";
    const string base64Koi = "zsHLz87Fwy3Uzw0K";
    const string charset = "KOI8-R";

    var originalBytes = base64Koi.FromBase64String(); // KOI bytes
    var originalEncoding = Encoding.GetEncoding(charset); // KOI Encoding
    var originalText = originalEncoding.GetString(originalBytes); // this is initial string correctly converted to .NET representation

    Assert.AreEqual(originalString, originalText);

    var unicodeEncoding = Encoding.UTF8;

    var originalWrongString = unicodeEncoding.GetString(originalBytes); // this is how the KOI string is represented in .NET, equals to base.Request.FormData["text"]
    var originalWrongBytes = originalEncoding.GetBytes(originalWrongString); 

    var unicodeBytes = Encoding.Convert(originalEncoding, unicodeEncoding, originalBytes);
    var result = unicodeEncoding.GetString(unicodeBytes);

    var unicodeWrongBytes = Encoding.Convert(originalEncoding, unicodeEncoding, originalWrongBytes);
    var wrongResult = unicodeEncoding.GetString(unicodeWrongBytes); // this is what I see in DB

    Assert.AreEqual(originalString, result);
    Assert.AreEqual(originalString, wrongResult); // I want this to pass!
}

المحلول

Discovered two underlying problems for my problem.

The first is from SendGrid - they post multi-part data without specifying content-type for non-unicode elements.

The second is from ServiceStack - currently it doesn't support encoding other than utf-8 for multi-part data.

Update:

SendGrid helpdesk promised to look into the issue, ServiceStack now fully support custom charsets in multi-part data.

As for initial question itself, one could access buffered stream in ServiceStack as described here: Can ServiceStack Runner Get Request Body?.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow