Domanda

I have this regexp:

(\b)(emozioni|gioia|felicità)(\b)

In a string like the one below:

emozioni emozioniamo felicità felicitàs

it should match the first and the third word. Instead it matches the first and the last. I assume it is because of the accented character. I tried this alternative:

(\b)(emozioni|gioia|felicità\s)(\b)

but it matched "felicità" only if there is an other word after it. So for being specific only if it is in this context:

emozioni emozioniamo felicità felicitàs

and not in this other:

emozioni emozioniamo felicitàs felicità

I've found an article about accented characters in French (so at the beginning of the word) here, i followed the second answer. If anyone knows a better solution it is very welcome.

È stato utile?

Soluzione

A word boundary \b works only with characters that are in \w character class, i.e [0-9a-zA-Z_], thus you can't put a \b after an accentued character like à.

You can solve the problem in your case using a lookahead:

felicità(?=\s|$)

or shorter:

felicità(?!\S)

(or \W in place of \s as suggested @Sniffer, but you take the risk to match something like :felicitàà)

Altri suggerimenti

Try the following alternative:

\b(emozioni|gioia|felicità)(?=\W|$)

This will match any of your listed words, as long as any of those words is followed by either a non-word character \W or end-of-string $.

Regex101 Demo

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top