Frage

My content contains multiple BOM (EF BB BF) characters and I want to remove them. The characters are in the middle of strings I want to simply remove them all.

The data comes from a JavaScript source, which I get from a CKEditor instance. Then I POST the variable and read it as string on my backend and the BOMS are there. For now, they are persisted as is, but this results in errors in post-processing when the characters are interpreted and start showing up mid-content. I suspect they come from something that was copypasted into my CKEditor.

I can step through the string char by char, but I don't know how to compare against the BOM. Would it somehow be possible to compare the hex values of the string bytes and compare three byte sequences?

War es hilfreich?

Lösung

The utf-8 BOM bytes get translated to \ufeff. Unicode character "Zero width no-break space", can't see them, can't hear them. Filter them out with:

   var good = bad.Replace("\ufeff", "");

Andere Tipps

Try the following:

CleanString = DirtyString.Replace("\u00EF\u00BB\u00BF", null);
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top