Using dynamic regular expression for a string.split() returns an array with undefined elements. What am I doing wrong?

StackOverflow https://stackoverflow.com/questions/23641408

  •  22-07-2023
  •  | 
  •  

Question

I am writing a node module that takes csv file and turns it into a javascript object. Because I am allowing the user to specify the delimiter, and support text qualifiers, I need to parse it with dynamic regex.

Here is how I create the regex:

settings.dilemeter = escapeForRegex(settings.dilemeter);


 settings.textQualifier = escapeForRegex(settings.textQualifier);
  var d = settings.dilemeter;
  var tq = settings.textQualifier;

  ///////////////////////////////////////////////////////////////
  /// This appears to be glitched
  ///////////////////////////////////////////////////////////////
  var searchArray = [
    "(" + tq + d + tq + ")", // First case to search for, eg: ","
    "(" + tq + d + ")", // Second case to search for, eg: ",
    "(" + d + tq + ")", // Third case to search for, eg: ,"
    "(" + d + ")", // Last case to search for, eg: ,
    "(" + tq + "$)", // if the text qualifier is the very last thing
  ];
  var regexString = "(" + searchArray.join('|') + ')';
  console.log(regexString);
  var regex = new RegExp(regexString);

Which produces a regular expression that looks like this (when using | and " as dilemeters and text qualifiers) (("\|")|("\|)|(\|")|(\|)|("$))

which seems to match strings that I want to match here: http://regexpal.com/?flags=gm&regex=((%22%5C%7C%22)%7C(%22%5C%7C)%7C(%5C%7C%22)%7C(%5C%7C)%7C(%22%24))&input=h1%7Ch2%7Ch3%7Ch4%0Avalue%201%7C%22Value%202%22%7Cvalue%203%7C%22value%20-%205%22%7Csomething%7C%22Else%22

However, when I run this using string.split(regex) I get really strange results.

var testString = [
        'h1|h2|h3|h4', // The first line will be the headers
        'value 1|"Value 2"|value 3|"value - 5"'// This is the first row of data
    ];
console.log(testString[1].split(regex));

produces:

["value 1",
"|"",
undefined,
undefined,
"|"",
undefined,
undefined,
"Value 2",
""|",
undefined,
""|",
undefined,
undefined,
undefined,
"value 3",
"|"",
undefined,
undefined,
"|"",
undefined,
undefined,
"value - 5",
""",
undefined,
undefined,
undefined,
undefined,
""",
""]

I can't seem to figure out why there are all of these undefined and why its returning the items that I am trying to split on.

I created a plunker with a more contextually complete demonstration http://plnkr.co/edit/hn2GUFYodYQeuQLqqwVD?p=preview

Was it helpful?

Solution

string.split(regexp) returns entries for all the capture groups in the regexp. If you need groups in the regexp, but don't want them to be included in the results, use non-capturing groups. These are denoted by putting ?: after the opening parenthesis of the group:

var searchArray = [
    "(?:" + tq + d + tq + ")", // First case to search for, eg: ","
    "(?:" + tq + d + ")", // Second case to search for, eg: ",
    "(?:" + d + tq + ")", // Third case to search for, eg: ,"
    "(?:" + d + ")", // Last case to search for, eg: ,
    "(?:" + tq + "$)", // if the text qualifier is the very last thing
  ];
var regexString = "(?:" + searchArray.join('|') + ')';
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top