Question

I'm trying to use regex to split string into field, but unfortunately it's not working 100% and is skipping some part which should be split. Here is part of program processing string:

void parser(String s) {
    String REG1 = "(',\\d)|(',')|(\\d,')|(\\d,\\d)";
    Pattern p1 = Pattern.compile(REG1);
    Matcher m1 = p1.matcher(s);
while (m1.find() ) {

            System.out.println(counter + ":  "+s.substring(end, m1.end()-1)+" "+end+   "  "+m1.end());
            end =m1.end();
        counter++;
    }
}

The string is:

s= 3101,'12HQ18U0109','11YX27X0041','XX21','SHV7-P Hig, Hig','','GW1','MON','E','A','ASEXPORT-1',1,101,0,'0','1500','V','','',0,'mb-master1'

and the problem is that it doesn't split ,1, or ,0,

Rules for parsing are: String is enclosed by ,' ', for example ,'ASEXPORT-1', int is enclosed only by , ,

expected output =

3101   |   12HQ18U0109  |  11YX27X0041  | XX21    |  SHV7-P Hig, Hig|  |GW1   |MON  |E  |  A|   ASEXPORT-1|  1  |101   |0   |  0  |1500  |   V|    |   |   0   |mb-master1

Altogether 21 elements.

Was it helpful?

Solution

You can split it with this regex

,(?=([^']*'[^']*')*[^']*$)

It splits at , only if there are even number of ' ahead


So for

3101,'12HQ18,U0109','11YX27X0041'

output would be

3101
'12HQ18,U0109'
'11YX27X0041'

Note

it wont work for nested strings like 'hello 'h,i'world'..If there are any such cases you should use the following regex

(?<='),(?=')|(?<=\d),(?=\d|')|(?<=\d|'),(?=\d)

OTHER TIPS

If you also (for some bizarre reason) need to know each matches start and end index in the original string (like you have it in your sample output), you can use the following pattern:

String regex = "('[^']*'|\\d+)";

which would match an unquoted integer or asingle-quoted string.
You can optionally remove the leading and trailing ' using a "second-pass" on the matching substring:

match = match.replaceAll("\\A'|'\\Z", "");

which replaces a leading and trailing ' with nothing.

The code could look like this:

Pattern pat = Pattern.compile("('[^']*'|\\d+)");
Matcher m = pat.matcher(str);

int counter = 0, start = 0;
while (m.find()) {
    String match = m.group(1);
    int end = start + match.length();
    match = match.replaceAll("\\A'|'\\Z", "");   // <-- comment out for NOT replacing 
                                                 //     leading and trailing quotes 
    System.out.format("%d: %s [%d - %d]%n", ++counter, match, start, end);
    start = end + 1;   // <-- the "+1" is to account for the ',' separator
}

See, also, this short demo.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top