skipping Byte Order mark (BOM) when using Stream CopyTo
-
09-06-2021 - |
문제
I am trying to merge files into an output file and my individual files have the BOM on them, how can I get rid of it while using Stream.CopyTo method..
My code looks like this:
using (var output = File.Open(outputFile,FileMode.Append,FileAccess.Write))
{
foreach (var inputFile in inputFiles)
{
using (var input = File.OpenRead(inputFile))
{
input.CopyTo(output);
}
}
}
해결책
Use the following method..
using (var output = File.Open(outputFile,FileMode.Append,FileAccess.Write))
{
foreach (var inputFile in inputFiles)
{
using (var input = File.OpenRead(inputFile))
{
input.Position = Encoding.Unicode.GetPreamble().Length; //The encoding might be any type.
input.CopyTo(output);
}
}
}
다른 팁
Don't use streams for copying text. Use StreamReader/StreamWriter instead:
using (var output = File.AppendText(outputFile))
{
foreach (var inputFile in inputFiles)
{
using (var input = File.OpenText(inputFile))
{
while (!input.EndOfStream)
{
output.WriteLine(input.ReadLine());
}
}
}
}
Any stream instance works at the byte-level, and it knows nothing about "character encoding" and "BOM".
You can't do that for arbitrary files. The BOM are a few bytes which could be perfectly valid encodings. In the example below the code happens to write a file which has the same bytes as the text "ABC" encoded in UTF-16, little-endian, with a BOM, but that doesn't mean that the first two bytes represent a BOM (or even that the file represents "ABC").
public class StackOverflow_10594371
{
public static void Test()
{
using (FileStream fs = File.Create("a.bin"))
{
fs.WriteByte(0xFF);
fs.WriteByte(0xFE);
fs.WriteByte(0x41);
fs.WriteByte(0x00);
fs.WriteByte(0x42);
fs.WriteByte(0x00);
fs.WriteByte(0x43);
fs.WriteByte(0x00);
}
Console.WriteLine(File.ReadAllText("a.bin"));
}
}