Question

How can I add a byte order mark to a StringBuilder? (I have to pass a string to another method which will save it as a file, but I can't modify that method).

I tried this:

var sb = new StringBuilder();
sb.Append('\xEF');
sb.Append('\xBB');
sb.Append('\xBF');

But when I view it with hex editor, it adds the following sequence: C3 AF C2 BB C2 BF

The string is huge, so it would be good to do it without back and forth converting to byte array.

Edit: Clarification after questions in comments. I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage. I can't modify the other method.

Was it helpful?

Solution

Two options:

  1. Don't include the byte order mark in your text at all... instead use an encoding which will automatically include it
  2. Include it as a character in your StringBuilder:

    sb.Append('\uFEFF'); // U+FEFF is the byte-order mark character
    

Personally I'd go for the first approach normally, but the "I can't modify that method" suggests it may not be an option in your case.

OTHER TIPS

Byte-order marks are to inform readers of a file that the file is of a particular encoding. As such, you should only need the byte-order marks (BOM) in the actual file. If you want to include BOM in a text file you're writing, simply use StreamWriter to write to the file. For example:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
    writer.Write(sb.ToString);
}

If you don't want BOM with UTF-8:

using(var writer = new StreamWriter(stream))
{
    writer.Write(sb.ToString());
}

Or if you want different BOM:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF16))
{
    writer.Write(sb.ToString);
}

Update:

If you wanted to be coupled from the implementation detail of a BOM or a BOM of a particular encoding (i.e. could change at runtime or after deployment) but still wanted to pass a BOM-marked string, you could do something like this (assumes .NET 4.5):

var stream = new MemoryStream();
var encoding = Encoding.UTF8; // TODO: configurize this, if necessary
using(var writer = new StreamWriter(stream, encoding, 1024, true))
{
    writer.Write(sb.ToString());
}
CantModifyButMustUseThis(encoding.GetString(stream.ToArray());

IIRC (and not certain that I do), BOM gets added when you convert to byte using one of the relevant Unicode Encoders. I believe some of those's constructors take a bool that control if to add BOM.

I used this code in ASP.NET core, and well!! it works

 [HttpGet("GetCsv")]
    public async Task<IActionResult> GetCsv() {
        
        var cc = new CsvConfiguration(new System.Globalization.CultureInfo("en-US"));
        var entity = await _service.AdminPanelList();
        using (var ms = new MemoryStream()) {
            using (var sw = new StreamWriter(stream: ms, encoding: new UTF8Encoding(true))) {
                using (var cw = new CsvWriter(sw, cc)) {

                    var bom = '\uFEFF'.ToString();
                    byte[] bomArray = Encoding.UTF8.GetBytes(bom);
                    
                    ms.Write(bomArray);
                    cw.WriteRecords(entity);
                }

                var finalArray = ms.ToArray();
                



                var result = File(finalArray, "text/csv", $"PersonExport.csv");
                    

                return result;
            }
        }
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top