Can you use zero-width matching regex in String split?
Question
System.out.println(
Arrays.deepToString(
"abc<def>ghi".split("(?:<)|(?:>)")
)
);
This prints [abc, def, ghi]
, as if I had split on "<|>"
. I want it to print [abc, <def>, ghi]
. Is there a way to work some regex magic to accomplish what I want here?
Perhaps a simpler example:
System.out.println(
Arrays.deepToString(
"Hello! Oh my!! Good bye!!".split("(?:!+)")
)
);
This prints [Hello, Oh my, Good bye]
. I want it to print [Hello!, Oh my!!, Good bye!!]
.
`.
Solution 3
Thanks to information from Cine, I think these are the answers I'm looking for:
System.out.println(
Arrays.deepToString(
"abc<def>ghi<x><x>".split("(?=<)|(?<=>)")
)
); // [abc, <def>, ghi, <x>, <x>]
System.out.println(
Arrays.deepToString(
"Hello! Oh my!! Good bye!! IT WORKS!!!".split("(?<=!++)")
)
); // [Hello!, Oh my!!, Good bye!!, IT WORKS!!!]
Now, the second one was honestly discovered by experimenting with all the different quantifiers. Neither greedy nor reluctant work, but possessive does.
I'm still not sure why.
OTHER TIPS
You need to take a look at zero width matching constructs:
(?=X) X, via zero-width positive lookahead
(?!X) X, via zero-width negative lookahead
(?<=X) X, via zero-width positive lookbehind
(?<!X) X, via zero-width negative lookbehind
You can use \b
(word boundary) as what to look for as it is zero-width and use that as the anchor for looking for <
and >
.
String s = "abc<def>ghi";
String[] bits = s.split("(?<=>)\\b|\\b(?=<)");
for (String bit : bits) {
System.out.println(bit);
}
Output:
abc
<def>
ghi
Now that isn't a general solution. You will probably need to write a custom split method for that.
Your second example suggests it's not really split()
you're after but a regex matching loop. For example:
String s = "Hello! Oh my!! Good bye!!";
Pattern p = Pattern.compile("(.*?!+)\\s*");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println("[" + m.group(1) + "]");
}
Output:
[Hello!]
[Oh my!!]
[Good bye!!]