Removing BOM characters from AJAX-posted string

https://stackoverflow.com/questions/13024978

13-07-2021
|

Pergunta

My content contains multiple BOM (EF BB BF) characters and I want to remove them. The characters are in the middle of strings I want to simply remove them all.

The data comes from a JavaScript source, which I get from a CKEditor instance. Then I POST the variable and read it as string on my backend and the BOMS are there. For now, they are persisted as is, but this results in errors in post-processing when the characters are interpreted and start showing up mid-content. I suspect they come from something that was copypasted into my CKEditor.

I can step through the string char by char, but I don't know how to compare against the BOM. Would it somehow be possible to compare the hex values of the string bytes and compare three byte sequences?

Solução

The utf-8 BOM bytes get translated to \ufeff. Unicode character "Zero width no-break space", can't see them, can't hear them. Filter them out with:

   var good = bad.Replace("\ufeff", "");

Outras dicas

Try the following:

CleanString = DirtyString.Replace("\u00EF\u00BB\u00BF", null);

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow