Question

I'm trying to split a string using a regular expression and split function in JavaScript. For example, I have a string: olej sojowy, sorbitol, czerwień koszenilową and my RegEx is:

/, (?!(któ))/g

When I test it here: http://regexr.com/38ps8 I get 2 matches, as expected, so in result I should get 3 elements after split.

But when I try to use this expression in split function:

var parts="olej sojowy, sorbitol, czerwień koszenilową".split(/, (?!(któ))/g);
console.log("Num of elements:" + parts.length); 
console.log(parts.join("!\n!"));

the result is different and it returns 5 elements in an array, with two additional empty strings:

Num of elements:5 
olej sojowy!
!!
!sorbitol!
!!
!czerwień koszenilową 

Why isn't it working as expected? Is it a problem with split function? Does it use a regular expression in a different way than I would expect?

Edit: I've just also noticed that if I change my Regular expression to /, /g then I get just what I wanted (3 elements in result), but there are other strings which I don't want to split if there is któ after the coma and space. So why is this operator changing a behaviour of split?

Was it helpful?

Solution 2

From Mozilla's JS ref:

If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.

If the regex in split contains capturing groups, the contents of each group is inserted in the result as well. Since you have a capturing group (któ), that is what you get. It is empty because (?!(któ)) is empty. If you add the text , któ anywhere inside your string, you will see it appear:

var parts="olej sojowy, któ sorbitol, czerwień koszenilową".split(/, (?!(któ))/g);

shows 3 elements. The 2nd is, quite surprising, just ", ". Then again, it is the one where któ follows (not sure how I can "prove" that").

If you omit the parentheses inside your lookahead, it works as you expect it to:

var parts="olej sojowy, któ sorbitol, czerwień koszenilową".split(/, (?!któ)/g);

No capturing groups so you get only the remaining text after removal of the matching regex.

OTHER TIPS

It's working exactly as it should. You've used , as the delimiter so it gives you five elements:

[1] olej sojowy
[2]   
[3] sorbitol
[4]   
[5] czerwień koszenilową

The empty elements are indicators of where the split(s) are located.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top