Question

I want to split a String at the word boundaries using Scanner. Normally, this would be done like this:

Scanner scanner = new Scanner(...).useDelimiter("\\b");

The problem is that my definition of "word" character is a tiny bit different from the standard [a-zA-Z_0-9] as I want to include some more characters and exclude the _: [a-zA-Z0-9#/]. Therefore, I can't use the \b pattern.

So I tried to do the same thing using look-ahead and look-behind, but what I came up with didn't work:

(<?=[A-Za-z0-9#/])(?![A-Za-z0-9#/])|(<?![A-Za-z0-9#/])(?=[A-Za-z0-9#/])

The scanner doesn't split anywhere using this.

Is it possible to do this using look-ahead and look-behind and how?

Was it helpful?

Solution

There's an error in your syntax. The ? comes first:

(?<=[A-Za-z0-9#/])(?![A-Za-z0-9#/])|(?<![A-Za-z0-9#/])(?=[A-Za-z0-9#/])
 ^^                                  ^^

OTHER TIPS

new Scanner(...).useDelimiter(
  "(?<=[a-zA-Z0-9#/])(?=[^a-zA-Z0-9#/])|(?<=[^a-zA-Z0-9#/])(?=[a-zA-Z0-9#/])");

what is wrong with:

[^A-Za-z0-9#/]+

in other words any run of at least one character in the set that is not your word set

or if you need the spaces

[^A-Za-z0-9#/ ]+

and then strip the spaces out for special processing after the scanner (if needed)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top