Question

I have some trouble understanding how the following regular expression is working.

,(?=([^\"]*\"[^\"]*\")*[^\"]*$)

The expression basically matches all the commas that are NOT enclosed in quotes.

For example:

apple, banana, pineapple, "tropical fruits like mango, guava, key lime", peaches

Will be split into:

apple
banana
pineapple
"tropical fruits like mango, guava, key lime"
peaches

Can someone provide me with a good breakdown of the expression? I don't understand how positive look-ahead is working.

Was it helpful?

Solution 2

Look-around assertions

Look-around assertions (positive look-ahead including) are zero-width checks. They really don’t consume anything from the input, but they let the regex engine backtrack if they are not satisfied.

Positive look-ahead remembers the position in input and tries to match from the current position to the right. If it does not match, regex engine backtracks, otherwise it returns to the remembered position in input and continues after the look-ahead.

The regex deconstructed

This regex consumes a comma and ensures, that the rest of input matches ([^\"]*\"[^\"]*\")*[^\"]*$.

  • [^\"] means “one character, not a double-quote”.
  • * means the previous character can be repeated zero or more times.
  • The parentheses form a group – it means “any string containing exactly two double-quotes, ending with one”.
  • When * is applied on this group, it means “any string containing even number of double-quotes, ending with one”.
  • The “ending with one [double-quote]” part of description is problem, you don’t want such a constraint. So you append [^\"]* to provide possibility for non-double-quote characters.
  • $ matches the end of string.

So all-in-all, the look-ahead checks if there is even number of double-quotes till the and of string after the comma.

OTHER TIPS

If you would visualy represent your regex you would getenter image description here (thanks to RegExpr)

You could use ^(([^'",]+|'[^']*'|"[^"]*")|,)+$ the the second capture group would get each of your elements enter image description here

Note: I have no clue what programming language you are using... That makes it harder to give a good example. Because I do not know what exacty you want to match. If you programing lanuaguage is able to store each of the #2 Groups in an array, you have a sollution...

According to RegexBuddy:

,(?=([^\"]*\"[^\"]*\")*[^\"]*$)

Match the character "," literally «,»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=([^\"]*\"[^\"]*\")*[^\"]*$)»
   Match the regular expression below and capture its match into backreference number 1 «([^\"]*\"[^\"]*\")*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «*»
      Match any character that is not a "A " character" «[^\"]*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Match the character """ literally «\"»
      Match any character that is not a "A " character" «[^\"]*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Match the character """ literally «\"»
   Match any character that is not a "A " character" «[^\"]*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

(I'm not affiliated with RegexBuddy or its author in any way. Just a user of the software product.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top