Question

A while ago, i've built the following regular expression:

~(?:<a.*?</a>|\[url.*?\[/url]|\[/?[^]]++]|</?[^>]++>)(*SKIP)(*FAIL)|\bcdkey\s*-\s*.*\b~is

This matches every kind of cdkey-xxx that's NOT inside of a bbcode or a html tag. That works fine so far.

However, i can't make it work properly when including bbcodes and html tags. I thought, removing the front part is enough, but i seem to be wrong:

~\bcdkey\s*-\s*.*\b~is

With this regex,

<a href="https://www.google.de/#q=cdkey-0192xdasas" class="externalURL">https://www.google.de/#q=cdkey-0192xdasas</a>

becomes

<a href="https://www.google.de/#q=***>

and

[url]https://www.google.de/#q=cdkey-0192xdasas[/url]

becomes

[url]https://www.google.de/#q=***]

while the expected results are

<a href="https://www.google.de/#q=***" class="externalURL">https://www.google.de/#q=***</a>

and

[url]https://www.google.de/#q=***[/url]

I have no idea, how to fix that.


So, what i try to achieve is to replace

[url]https://www.google.de/#q=cdkey-0192xdasas[/url]
[url=https://www.google.de/#q=cdkey-0192xdasas]Test[/url]
[img]https://www.google.de/#q=cdkey-0192xdasas[/img]
[url="https://www.google.de/#q=cdkey-0192xdasas"]Test 3[/url]
https://www.google.de/#q=cdkey-0192xdasas
    Another plaintext cdkey   -   bla
<a href="https://www.google.de/#q=cdkey-0192xdasas" class="externalURL">https://www.google.de/#q=cdkey-0192xdasas</a>
<a href='https://www.google.de/#q=cdkey-0192xdasas'>Le Google</a>

with

[url]https://www.google.de/#q=***[/url]
[url=https://www.google.de/#q=***]Test[/url]
[img]https://www.google.de/#q=***[/img]
[url="https://www.google.de/#q=***"]Test 3[/url]
Plaintext https://www.google.de/#q=***
    Another plaintext ***
<a href="https://www.google.de/#q=***" class="externalURL">https://www.google.de/#q=***</a>
<a href='https://www.google.de/#q=***'>Le Google</a>
Was it helpful?

Solution

The problem I see with your regular expression is the .* part.

You're matching the most amount possible across your matches, and no need to use the s modifier.

If you know your cdkey will always be numbers and letters, you could do something like this.

$text = preg_replace('/cdkey\s*-\s*[a-z0-9]+/i', '***', $text);

See working demo

OTHER TIPS

I think the word boundaries \b are not compatible with the syntax you have inside them. Specifically the hyphen and dot-star sequence is not going to match the same way It normally does.

If you know what might terminate the cdkey, something like this

 # \bcdkey\s*-\s*[^<>\[\]"'\s]*

 \b cdkey \s* - \s* [^<>\[\]"'\s]* 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top