How to GetBytes() in C# with UTF8 encoding with BOM?

https://stackoverflow.com/questions/4414088

08-10-2019
|

Question

I'm having a problem with UTF8 encoding in my asp.net mvc 2 application in C#. I'm trying let user download a simple text file from a string. I am trying to get bytes array with the following line:

var x = Encoding.UTF8.GetBytes(csvString);

but when I return it for download using:

return File(x, ..., ...);

I get a file which is without BOM so I don't get Croatian characters shown up correctly. This is because my bytes array does not include BOM after encoding. I triend inserting those bytes manually and then it shows up correctly, but that's not the best way to do it.

I also tried creating UTF8Encoding class instance and passing a boolean value (true) to its constructor to include BOM, but it doesn't work either.

Anyone has a solution? Thanks!

Solution

Try like this:

public ActionResult Download()
{
    var data = Encoding.UTF8.GetBytes("some data");
    var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
    return File(result, "application/csv", "foo.csv");
}

The reason is that the UTF8Encoding constructor that takes a boolean parameter doesn't do what you would expect:

byte[] bytes = new UTF8Encoding(true).GetBytes("a");

The resulting array would contain a single byte with the value of 97. There's no BOM because UTF8 doesn't require a BOM.

OTHER TIPS

I created a simple extension to convert any string in any encoding to its representation of byte array when it is written to a file or stream:

public static class StreamExtensions
{
    public static byte[] ToBytes(this string value, Encoding encoding)
    {
        using (var stream = new MemoryStream())
        using (var sw = new StreamWriter(stream, encoding))
        {
            sw.Write(value);
            sw.Flush();
            return stream.ToArray();
        }
    }
}

Usage:

stringValue.ToBytes(Encoding.UTF8)

This will work also for other encodings like UTF-16 which requires the BOM.

UTF-8 does not require a BOM, because it is a sequence of 1-byte words. UTF-8 = UTF-8BE = UTF-8LE.

In contrast, UTF-16 requires a BOM at the beginning of the stream to identify whether the remainder of the stream is UTF-16BE or UTF-16LE, because UTF-16 is a sequence of 2-byte words and the BOM identifies whether the bytes in the words are BE or LE.

The problem does not lie with the Encoding.UTF8 class. The problem lies with whatever program you are using to view the files.

Remember that .NET strings are all unicode while there stay in memory, so if you can see your csvString correctly with the debugger the problem is writing the file.

In my opinion you should return a FileResult with the same encoding that the files. Try setting the returning File encoding,

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow