Question

I am trying to find the shortest character sequence between the words "I" and "disagree" (not case-sensitive). I've read through all the similar questions on SO, but none of the solutions seem to work for me. Here is an example sentence that is causing me grief:

As an American, I must disagree with you.

And here is my best guess as to a regex pattern:

I(.*?)disagree

I want to capture just the " must ", but instead, I'm capturing the longer string, "can, I must ". I am hoping not to have to specify that the "I" must be followed by something else, like a space. Because, then I wouldn't capture anything in a sentence like, "I'll disagree with that." I also don't want to insist that the "I" be capitalized. Basically, I just want the least-greedy match possible. This site is one of the places I'm using to verify the solution:

http://regexpal.com/?flags=gi&regex=I%28.*%3F%29%20disagree&input=As%20an%20American%2C%20I%20must%20disagree%20with%20you.

Was it helpful?

Solution

The general approach is to use a negative lookahead:

(I)(((?!\1).)*?) disagree

See demo here.

Notice the group for the word in between is now $2. If you don't want that, you can repeat the 1st word:

I((?:(?!I).)*?) disagree

But I'd say the first version is easier to maintain, specially if the word is bigger.

OTHER TIPS

you need to use lookaround for regex, use this regex (?<=[iI])(\W.*)(?=disagree) and you will get only the word between I and disagree.

Example here

Use word boundaries (\b):

/\bi(.*?)\bdisagree/i
  • case insensitive
  • matches I, but not I'll (the 'll will be part of the captured)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top