How Can I Use Look-Ahead and Look-Behind to Create a Custom Boundary Matcher?
-
13-07-2021 - |
Question
I want to split a String at the word boundaries using Scanner. Normally, this would be done like this:
Scanner scanner = new Scanner(...).useDelimiter("\\b");
The problem is that my definition of "word" character is a tiny bit different from the standard [a-zA-Z_0-9]
as I want to include some more characters and exclude the _
: [a-zA-Z0-9#/]
. Therefore, I can't use the \b
pattern.
So I tried to do the same thing using look-ahead and look-behind, but what I came up with didn't work:
(<?=[A-Za-z0-9#/])(?![A-Za-z0-9#/])|(<?![A-Za-z0-9#/])(?=[A-Za-z0-9#/])
The scanner doesn't split anywhere using this.
Is it possible to do this using look-ahead and look-behind and how?
Solution
There's an error in your syntax. The ?
comes first:
(?<=[A-Za-z0-9#/])(?![A-Za-z0-9#/])|(?<![A-Za-z0-9#/])(?=[A-Za-z0-9#/])
^^ ^^
OTHER TIPS
new Scanner(...).useDelimiter(
"(?<=[a-zA-Z0-9#/])(?=[^a-zA-Z0-9#/])|(?<=[^a-zA-Z0-9#/])(?=[a-zA-Z0-9#/])");
what is wrong with:
[^A-Za-z0-9#/]+
in other words any run of at least one character in the set that is not your word set
or if you need the spaces
[^A-Za-z0-9#/ ]+
and then strip the spaces out for special processing after the scanner (if needed)