This happens if you use an encoding where every character is two bytes.
CRLF would then be encoded as \0\r\0\n
.
Git thinks it's a single-byte encoding, so it turns that into \0\r\0\r\n
.
This makes the next line one byte off, causing every other line be full of Chinese. (because the \0
becomes the low-order byte rather than the high-order byte)
You can convert files to UTF8 using this LINQPad script:
const string path = @"C:\...";
foreach (var file in Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories))
{
if (!new [] { ".html", ".js"}.Contains(Path.GetExtension(file)))
continue;
File.WriteAllText(file, String.Join("\r\n", File.ReadAllLines(file)), new UTF8Encoding(encoderShouldEmitUTF8Identifier: true));
file.Dump();
}
This will not fix broken files; you can fix the files by replacing \r\n
with \n
in a hex editor. I don't have a LINQPad script for that. (since there's no simple Replace()
method for byte[]
s)