Question

I have a problem which has an solution that can be solved by iteration, but I'm wondering if there's a more elegant solution using regular expressions and split()

I have a string (which excel is putting on the clipboard), which is, in essence, comma delimited. The caveat is that when the cell values contain a comma, the whole cell is surrounded with quotation marks (presumably to escape the commas within that string). An example string is as follows:

123,12,"12,345",834,54,"1,111","98,273","1,923,002",23,"1,243"

Now, I want to elegantly split this string into individual cells, but the catch is I cannot use a normal split expression with comma as a delimiter, because it will divide cells that contain a comma in their value. Another way of looking at this problem, is that I can ONLY split on a comma if there is an EVEN number of quotation marks preceding the comma.

This is easy to solve with a loop, but I'm wondering if there's a regular expression.split function capable of capturing this logic. In an attempt to solve this problem, I constructed the Deterministic Finite Automata (DFA) for the logic.

alt text

The question now is reduced to the following: is there a way to split this string such that a new array element (corresponding to /s) is produced each time the final state (state 4 here) is reached in a DFA?

Was it helpful?

Solution

Using regex (unescaped): (?:(?:"[^"]*")|(?:[^,]*))

Use that and call Regex.Matches() which is .NET, or its analog in other platforms.

You could further expand the above to this: ^(?:(?:"(?<Value>[^"]*)")|(?<Value>[^,]*))(?:,(?:(?:"(?<Value>[^"]*)")|(?<Value>[^,]*)))*$

This will parse the whole string in 1 shot, but you need named groups and multi-capture per group for this to work (.NET supports it).

OTHER TIPS

Eligible commas are also followed by an even number of quotes, and VBScript does support lookaheads. Try splitting on this:

",(?=(?:[^""]*""[^""]*"")*[^""]*$)"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top