Question

I try to resolve this String using a Mathcer: "2+30*4+(5+6)*7"

using this Pattern: "\d*|[()+*-]"

for some reason, the Matcher splits the string correctly, but when going over the splitted strings, it doesn't divide them correctly, leaving empty strings for anything but the digits:

String s = "2+30*4+(5+6)*7";        
    Pattern p = Pattern.compile("\\d*|[()+*-]");
    Matcher m = p.matcher(s);
    while (m.find()) {
          System.out.print("Start index: " + m.start());
          System.out.print(" End index: " + m.end() + " ");
          System.out.println("-----> " + m.group());
    }

This gives the following output:

Start index: 0 End index: 1 -----> 2
Start index: 1 End index: 1 -----> 
Start index: 2 End index: 4 -----> 30
Start index: 4 End index: 4 -----> 
Start index: 5 End index: 6 -----> 4
Start index: 6 End index: 6 -----> 
Start index: 7 End index: 7 -----> 
Start index: 8 End index: 9 -----> 5
Start index: 9 End index: 9 -----> 
Start index: 10 End index: 11 -----> 6
Start index: 11 End index: 11 -----> 
Start index: 12 End index: 12 -----> 
Start index: 13 End index: 14 -----> 7
Start index: 14 End index: 14 -----> 

I don't understand why, for example in the second line the end index is 1 (and not 2) resulting an empty string: Start index: 1 End index: 1 ----->

By the way, when I change the pattern's order to "[()+-]|\d" it works fine...

Was it helpful?

Solution

Empty strings are allowed by \\d* since it means zero or more digits. If you don't want to find strings that have zero digits (are empty) change \\d* to \\d+.

Demo

String s = "2+30*4+(5+6)*7";        
Pattern p = Pattern.compile("\\d+|[()+*-]");
Matcher m = p.matcher(s);
while (m.find()) {
      System.out.print("Start index: " + m.start());
      System.out.print(" End index: " + m.end() + " ");
      System.out.println("-----> " + m.group());
}

Output:

Start index: 0 End index: 1 -----> 2
Start index: 1 End index: 2 -----> +
Start index: 2 End index: 4 -----> 30
Start index: 4 End index: 5 -----> *
Start index: 5 End index: 6 -----> 4
Start index: 6 End index: 7 -----> +
Start index: 7 End index: 8 -----> (
Start index: 8 End index: 9 -----> 5
Start index: 9 End index: 10 -----> +
Start index: 10 End index: 11 -----> 6
Start index: 11 End index: 12 -----> )

If you are not interested in positions of your tokens you can also split before or after each of + - * / ( ) like

String s = "2+30*4+(5+6)*7";
String[] tokens = s.split("(?<=[+\\-*/()])|(?=[+\\-*/()])");
for (String token : tokens)
    System.out.println(token);

output:

2
+
30
*
4
+
(
5
+
6
)
*
7

OTHER TIPS

\\d* matches zero or more digits. So after the first match, the matcher is looking at "+30*4+(5+6)*7", and the first thing the matcher asks is, "Does this string begin with zero or more digits? By golly, yes it does!" (It checks this first, because \\d* appears first in the pattern.) So that's why the matcher is returning an empty string (a string of zero digits).

Changing it to \\d+, which matches one or more digits, should work.

What you tried with your regix \\d*|[()+*-] can be represented as

enter image description here

It matches Zero or more digits.

You need to change it as one or more with the regix \\d+|[()+*-] and can be represented as

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top