Question

I am thinking of validating an infix notation which consists of alphabets as operands and +-*/$ as operators [eg: A+B-(C/D)$(E+F)] using regex in Java. Is there any better way? Is there any regex pattern which I can use?

Was it helpful?

Solution

I am not familiar with the language syntax of infix, but you can certainly do a first pass validation check which simply verifies that all of the characters in the string are valid (i.e. acceptable characters = A-Z, +, -, *, /, $, ( and )). Here is a Java program which checks for valid characters and also includes a function which checks for unbalanced (possibly nested) parentheses:

import java.util.regex.*;
public class TEST {
    public static void main(String[] args) {
        String s = "A+B-(C/D)$(E+F)";
        Pattern regex = Pattern.compile(
            "# Verify that a string contains only specified characters.\n" +
            "^                # Anchor to start of string\n" +
            "[A-Z+\\-*/$()]+  # Match one or more valid characters\n" +
            "$                # Anchor to end of string\n",
            Pattern.COMMENTS);
        Matcher m = regex.matcher(s);
        if (m.find()) {
            System.out.print("OK: String has only valid characters.\n");
        } else {
            System.out.print("ERROR: String has invalid characters.\n");
        }
        // Verify the string contains only balanced parentheses.
        if (checkParens(s)) {
            System.out.print("OK: String has no unbalanced parentheses.\n");
        } else {
            System.out.print("ERROR: String has unbalanced parentheses.\n");
        }
    }
    // Function checks is string contains any unbalanced parentheses.
    public static Boolean checkParens(String s) {
        Pattern regex = Pattern.compile("\\(([^()]*)\\)");
        Matcher m = regex.matcher(s);
        // Loop removes matching nested parentheses from inside out.
        while (m.find()) {
            s = m.replaceFirst(m.group(1));
            m.reset(s);
        }
        regex = Pattern.compile("[()]");
        m = regex.matcher(s);
        // Check if there are any erroneous parentheses left over.
        if (m.find()) {
            return false;   // String has unbalanced parens.
        }
        return true;        // String has balanced parens.
    }
}

This does not validate the grammar, but may be useful as a first test to filter out obviously bad strings.

OTHER TIPS

Possibly overkill, but you might consider using a fully fledged parser generator such as ANTLR (http://www.antlr.org/). With ANTLR you can create rules that will generate the java code for you automatically. Assuming you have only got valid characters in the input this is a syntax analysis problem, otherwise you would want to validate the character stream with lexical analysis first.

For syntax analysis you might have rules like:

PLUS : '+' ;
etc...

expression:
         term ( ( PLUS | MINUS | MULTIPLY | DIVIDE )^ term )*
      ;
term:
    constant
  | OPENPAREN! expression CLOSEPAREN!
  ;

With constant being integers/reals whatever. If the ANTLR generated parser code can't match the input with your parser rules it will throw an exception so you can determine whether code is valid.

You probably could do it with recursive PCRE..but this may be a PITA.

since you only want to validate it, you can do it very simple. just use a stack, push all the elements one by one and remove valid expressions.

define some rules, for example:

  • an operator is only allowed if there is an alphabet on top of the stack
  • an alphabet or parentheses are only allowed if there is an operator on top of the stack
  • everything is allowed if the stack is empty

then:

  • if you encounter a closing parenthes remove everything up to the opening parenthes.
  • if you encounter an alphabet, remove the expression

after every removal of an expression, add an dummy alphabet. repeat the previous steps. if the result is an alphabet, the expression is valid.

or something like that..

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top