Pergunta

My content contains multiple BOM (EF BB BF) characters and I want to remove them. The characters are in the middle of strings I want to simply remove them all.

The data comes from a JavaScript source, which I get from a CKEditor instance. Then I POST the variable and read it as string on my backend and the BOMS are there. For now, they are persisted as is, but this results in errors in post-processing when the characters are interpreted and start showing up mid-content. I suspect they come from something that was copypasted into my CKEditor.

I can step through the string char by char, but I don't know how to compare against the BOM. Would it somehow be possible to compare the hex values of the string bytes and compare three byte sequences?

Foi útil?

Solução

The utf-8 BOM bytes get translated to \ufeff. Unicode character "Zero width no-break space", can't see them, can't hear them. Filter them out with:

   var good = bad.Replace("\ufeff", "");

Outras dicas

Try the following:

CleanString = DirtyString.Replace("\u00EF\u00BB\u00BF", null);
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top