Вопрос

Pattern:

"(([^",\n  ]*[,\n  ])*([^",\n  ]*"{2})*)*[^",\n  ]*"[  ]*,[  ]*|[^",\n]*[  ]*,[  ]*|"(([^",\n  ]*[,\n  ])*([^",\n  ]*"{2})*)*[^",\n  ]*"[  ]*|[^",\n]*[  ]*

This Regex is for parsing CSV file. But when it goes into Pattern.matcher, I encounter a hung thread exception. Appreciate it if someone can help fine tune this pattern.

[7/1/13 16:45:26:745 GMT+08:00] 00000029 ThreadMonitor W   WSVR0605W: Thread "MessageListenerThreadPool : 0" (00000035) has been active for 691836 milliseconds and may be hung.  There is/are 1 thread(s) in total in the server that may be hung.
at java.util.regex.Pattern$Curly.match(Pattern.java:4233)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4752)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.match(Pattern.java:4733)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4665)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4754)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$Loop.match(Pattern.java:4742)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4665)
at java.util.regex.Pattern$BitClass.match(Pattern.java:2912)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4278)
at java.util.regex.Pattern$Curly.match(Pattern.java:4233)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4752)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
Это было полезно?

Решение

Description

The problem appears to be the shear amount of back tracking being done to accomplish the match.

If your CSV is well formed you could use a more simple regex to parse each line. Note this will only separate the quote-comma and comma delimited values from a string, so you'd need to pass each line through the .matcher with this regex and iterate over each of the matches.

regex: (?:^|,)"?((?<=")[^"]*|[^,"]*)"?(?=,|$)

enter image description here

Java Code Example:

Live example: http://ideone.com/NBmzrk

Sample Text

"root",test1,1111,"22,22",,fdsa

Code

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("(?:^|,)\"?((?<=\")[^\"]*|[^,\"]*)\"?(?=,|$)",Pattern.CASE_INSENSITIVE);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

Capture Group 1

[0] => root
[1] => test1
[2] => 1111
[3] => 22,22
[4] => 
[5] => fdsa
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top