The best I was able to do in RegEx is just match the entire string of values, but not get them into capturing groups. This means I wasn't able to just match/replace without a callback function. Depending on your language, you'll have to do this differently, but I'll show an example in PHP. Here is the regex:
(?<="NULL","0","0","0",",","1",",)(?:"[^"]+",?)+(?=,"D\$")
First, we start by looking behind ((?<=...)
) for your "NULL","0","0","0",",","1",",
string. Then we use a non-capturing repeated group ((?:...)+
) that will catch 1+ CSV columns. The syntax inside matches "
, followed by 1+ non-"
characters, followed by "
and an optional ,
. Findally, we look ahead ((?=...)
) for your ,"D\$"
string which ends the list of words.
Given this string:
"apple","NULL","0","0","0",",","1",","fruit","red","sweet","D$","object"
It will match:
"fruit","red","sweet"
In PHP, I used preg_replace_callback()
to loop through each match and then I replace all instances of ","
with ,
. When $csv
equals your sample data, this gives you your intended output.
$csv = preg_replace_callback(
'/(?<="NULL","0","0","0",",","1",",)(?:"[^"]+",?)+(?=,"D\$")/',
function($matches) {
return str_replace('","', ',', reset($matches));
},
$csv
);
Output:
"apple","NULL","0","0","0",",","1",","fruit,red,sweet","D$","object"
"horse","NULL","0","0","0",",","1",","animal,large,tail","D$","object"
"Los Angeles","NULL","0","0","0",",","1",","city,California,smoggy,entertainment","D$","location"
Note: The reason I don't think I am able to do this in one simple regex replace is because (to my knowledge) regex isn't good at capturing X groups. If, for instance, we replaced the repeated non-capturing group with something like (?:"([^"]+)",?)+
(added a capture group around the word, [^"]+
), it would still only count as 1 captured group. See this example to see what I mean. You could literally repeat that non-capturing group, and make each one after the first optional with ?
. However, you'd have to include at least as many as your largest example (see here).