Question

I have a string like "Welcome to McDonalds®: I'm loving it™" ... I want to get rid of ":", "'", ® and symbols. I have tried the following so far:

$string = "Welcome to McDonalds®: I'm loving it™";
$string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); 

But on the output I receive:

"Welcome to McDonaldsreg Im loving ittrade"... so preg_replace somehow converts ® to 'reg' and to 'trade', which is not good for me and I cannot understand, why such a conversion happens at all.

How do I get rid of this conversion?

Solved: Thanks for ideas, guys. I solved the problem:

$string = preg_replace(
    array('/[^a-zA-Z0-9 -]/', '/&[^\s]*;/'), 
    '', 
    preg_replace(
        array('/&[^\s]*;/'), 
        '', 
        htmlentities($string)
    )
);
Was it helpful?

Solution

You're probably having the special characters in entity form, i.e. ® is really ® in your string. So it's not seen by the replacement operation.

To fix this, you could filter for the &SOMETHING; substring, and remove them. There might be built-in methods to do this, perhaps html_entity_decode.

OTHER TIPS

If you are looking to replace only the mentioned characters, use

$cleaned = str_replace(array('®','™','®','™', ":", "'"), '', $string);

Regular string replacement methods are usually faster and there is nothing in your example you want to replace that would need the pattern matching power of the Regular Expression engine.

EDIT due to comments: If you need to replace character patterns (as indicated by the solution you gave yourself), a Regex is indeed more appropriate and practical.

In addition, I'm sure McD requires both symbols to be in place if that slogan is used on any public website

® is ®, and ™ is ™. As such, you'll want to remove anything that follows
the pattern &[#0-9a-z]+; before-hand:

$input = "Remove all ™ and ® symbols, please.";
$string = preg_replace("/&[#0-9a-z]+;/i", "", $input);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top