Removing BOM characters from AJAX-posted string
-
13-07-2021 - |
Question
My content contains multiple BOM (EF BB BF) characters and I want to remove them. The characters are in the middle of strings I want to simply remove them all.
The data comes from a JavaScript source, which I get from a CKEditor instance. Then I POST the variable and read it as string on my backend and the BOMS are there. For now, they are persisted as is, but this results in errors in post-processing when the characters are interpreted and start showing up mid-content. I suspect they come from something that was copypasted into my CKEditor.
I can step through the string char by char, but I don't know how to compare against the BOM. Would it somehow be possible to compare the hex values of the string bytes and compare three byte sequences?
Solution
The utf-8 BOM bytes get translated to \ufeff
. Unicode character "Zero width no-break space", can't see them, can't hear them. Filter them out with:
var good = bad.Replace("\ufeff", "");
OTHER TIPS
Try the following:
CleanString = DirtyString.Replace("\u00EF\u00BB\u00BF", null);