Question

I need to remove several different hex values from a string of product descriptions.

Example: "Sale on CoolItem™ Watch" OR "Deal buster on RMKHoody™ signed"

™  ™

are just a few hex strings in this large database.

I need a reg exp to replace each with an empty string.

Result : "Sale on CoolItem Watch" OR "Deal buster on RMKHoody signed"

What would be the reg exp to find the semi-col and select forward to the & and replace the entire selection?

UPDATE/SOLUTION-WORKING CODE

string s = "Sale on CoolItem™ Watch"
var cleanProductName = Regex.Replace(s, @"&#x?[^;]{2,4};", string.Empty);
cleanProductName = "Sale on CoolItem Watch"


string s = "Deal buster on RMKHoody™ signed"
var cleanProductName = Regex.Replace(s, @"&#x?[^;]{2,4};", string.Empty);
cleanProductName = "Deal buster on RMKHoody signed"

You can also use

var cleanProductName = Regex.Replace(s, @"&[^;]{1,6};", string.Empty);

for more spec char such as ® . ™ . °

Was it helpful?

Solution

You could try &#x?[^;]{2,4};, meaning: &# followed by zero or one x followed by 2 to 4 characters that are not ;, followed by ;.

OTHER TIPS

\\&\\#x?\\d+\\; could be a starting point.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top