I couldn't find any documentation on exactly which features MongoDB's PCRE implementation supports, but if it includes the \pL
Unicode character class as well as look-ahead and look-behind assertions, then a Unicode-aware replacement for \b
would be:
(?:(?=\pL)(?<!\pL)|(?!\pL)(?<=\pL))
Basically, (?=\pL)(?<!\pL)
matches if the next character is a letter while the previous one is not, whereas (?!\pL)(?<=\pL)
conversely matches if the previous character is a letter but the next one is not.
Of course, this regexp can be simplified a lot if we already know something about what the adjacent characters can be. For example, the Unicode-aware version of \b\pL+\b
can be written simply as:
(?<!\pL)\pL+(?!\pL)