You could try &#x?[^;]{2,4};
, meaning: &#
followed by zero or one x
followed by 2 to 4 characters that are not ;
, followed by ;
.
Remove unicode hex values using regex
Вопрос
I need to remove several different hex values from a string of product descriptions.
Example: "Sale on CoolItem™ Watch
" OR "Deal buster on RMKHoody™ signed
"
™ ™
are just a few hex strings in this large database.
I need a reg exp to replace each with an empty string.
Result : "Sale on CoolItem Watch
" OR "Deal buster on RMKHoody signed
"
What would be the reg exp to find the semi-col and select forward to the & and replace the entire selection?
UPDATE/SOLUTION-WORKING CODE
string s = "Sale on CoolItem™ Watch"
var cleanProductName = Regex.Replace(s, @"&#x?[^;]{2,4};", string.Empty);
cleanProductName = "Sale on CoolItem Watch"
string s = "Deal buster on RMKHoody™ signed"
var cleanProductName = Regex.Replace(s, @"&#x?[^;]{2,4};", string.Empty);
cleanProductName = "Deal buster on RMKHoody signed"
You can also use
var cleanProductName = Regex.Replace(s, @"&[^;]{1,6};", string.Empty);
for more spec char such as ® . ™ . °
Решение
Другие советы
\\&\\#x?\\d+\\;
could be a starting point.
Не связан с StackOverflow