Question

Right now I'm learning regular expression on Java and I have a question about the word boundaries. So when I looking for word boundaries on Java Regular Expression, I got this \b that accepts word bordered by non-word character so this regex \b123\b will accepts this string 123 456 but will rejects 456123456. Now I found that a condition like the word !$@#@%123^^%$# or "123" still got accepted by the regex above. Is there any word boundaries/pattern that rejects word that bordered by non-alphanumeric (except space) like the example above?

Was it helpful?

Solution

You want to use \s instead of \b. That will look for a whitespace character rather than a word boundary.

If you want your first example of 123 456 to be a match, however, then you will also need to use anchors to accept 123 at the immediate start or end of the string. This can be accomplished via (\s|^)123(\s|$). The carat ^ matches the start of the string and $ matches the end of the string.

OTHER TIPS

(?<!\S)123(?!\S)

(?<!\S) matches a position that is not preceded by a non-whitespace character. (negative lookbehind)

(?!\S) matches a position that is not followed by a non-whitespace character. (negative lookahead)

I know this seems gratuitously complicated, but that's because \b conceals a lot of complexity. It's equivalent to this:

(?<=\w)(?!\w)|(?=\w)(?<!\w)

...meaning a position that's preceded by a word character and not followed by one, or a position that's followed by a word character and not preceded by one.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top