문제

I'd like to find two non-identical Unicode words separated by a colon using a PCRE regex.

Take for example, this string:

Lôrem:ipsüm dõlör:sït amêt:amêt cønsectetûr:cønsectetûr âdipiscing:elït

I can easily find the two identical words separated by a colon using:

(\p{L}+):(\1)

which will match: cønsectetûr:cønsectetûr and amêt:amêt

However, I want to negate the backreference to find only non-identical Unicode words separated by a colon.

What's the proper way to negate a backreference in PCRE?

(\p{L}+):(^\1) obviously does not work.

도움이 되었습니까?

해결책

You start by using a negative lookahead to prevent a match if the captured part repeats after the colon:

(\p{L}+):(?!\1)

Then you need to match the second unicode word, another \p{L}+:

(\p{L}+):(?!\1)\p{L}+

And last, to prevent false matches, use word boundaries:

\b(\p{L}+):(?!\1\b)\p{L}+\b

regex101 demo

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top