Mysterious garbage character - IE 8 only
-
12-06-2021 - |
题
I am building a table, with content pulled from other elements in the page (page scraping).
I am using innerText or textContent to pull the text, then a regular expression to trim it:
string.replace(/^\s+|\s+$/g,"");
This works fine in IE 9 and Chrome, but in IE 8 I am getting a garbage character that I cannot identify. I was able to reproduce the behavior with alerts in jsfiddle:
What is this extra character, and how can I get rid of it?
Update: thanks for the helpful replies! It seems that the character in question is u200E (left to right mark). So the second part of my question remains, how can I get rid of such characters with regular expressions, and just keep regular text?
解决方案
Both the "At Risk" and "Complete" <th>
tags in your jsFiddle snippet have a U+200E (Left-to-Right Mark, aka LRM) code point at the end of their content. That is not a whitespace character, so it cannot be matched by \s
.
One way to get rid of this character is to use the XRegExp library, so that you can replace all matches of \p{C}
with the empty string (i.e., delete them). \p{C}
matches any code point in Unicode's "Other" category, which includes control, format, private use, surrogate, and unassigned code points. U+200E, specifically, is within the \p{Cf}
"Other, Format" subcategory.
其他提示
Try printing to the page the result of
escape(string.replace(/^\s+|\s+$/g,""));
Your garbage character should show up as an escape code.