You have not specified what programming language you are using, but in many of them, the \b
character class can only be used with plain ASCII encoding.
Internally, \b
is processed as a boundary between \w
and \W
sets.
In turn, \w
is equal to [a-zA-Z0-9_]
.
If you are not using any fancy space marks (you shouldn't), then consider using regular whitespace char classes (\s
).
See this table (scroll down to Word Boundaries section) to check if your language supports Unicode for \b
. If it says, "ascii", then it does not.
As a side note, depending on your programming language, you may consider using direct Unicode code points instead of national characters.
Se also: utf-8 word boundary regex in javascript
Further reading: