Question

I am tryint to create a regex/replacement pair to use with replaceAll() that will capture the characters just before and after a target string.

Here's my version, that works for simple cases:

String adjacent = "fooaXbcXdbar".replaceAll(".*?(.)X(.).*?(?=(.X)|$)", "$1$2");

which produces "abcd" as desired (the look ahead at the end is to consume to end of string making the single call to replaceAll() work).

However, there's an edge case I can't seem to solve for, when the character after the target is also a character before a target:

String adjacent = "fooaXbXdbar".replaceAll(".*?(.)X(.).*?(?=(.X)|$)", "$1$2");

produces "ab", but I would like "abbd". The regex has consumed the leading part of the match, making the following input not match.

I've tried look arounds, but can't seem to get it to work.


Note: I'm not interested in solutions that involve loops or code etc. Just seeking the regex and replacement string that will work for the edge case mentioned.

Was it helpful?

Solution

How about this:

String adjacent =
    "fooaXbXdbar".replaceAll(".*?(.)X(?:(?=(.)X)|(.).*?(?=.X|$))", "$1$2$3");

?

What it does is, after the X, it first checks to see if it's immediately followed by .X, in which case it captures the . as $2 and considers the match complete; if it finds that it's not immediately followed by .X, it goes on to use the same logic that you were already using, capturing the subsequent character as $3.

(Note: I've tested this with both of your examples, but obviously it may miss other cases that you need to support. I recommend that you test it yourself as well.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top