Question

Ok, so, I have an exercise to build a sort of a java compiler. I won't get too much in details. Basically, I want to know if it's possible to use a regex that can identify a closing bracket. for example, this would be a legal input

void foo(){
   asd
}

and this won't be

void foo(){
   asd
   if (){
      asd
   }

as you can see, there is only 1 closer (}) for 2 openers ({), making it invalid input. is there any way to use a regex and identify that the number of appearances matches?

Was it helpful?

Solution

It is not possible to check for correct parentheses with regular expressions, because to check that you'd need to track how many brackets have been opened etc, but regular expressions cannot do that.

I suggest to you, especially if you want to construct a compiler, to familiarize yourself with formal language theory. For example, this wikipedia article gives some insight on regular expressions in the context for formal language theory: http://en.wikipedia.org/wiki/Regular_expression#Formal_language_theory

OTHER TIPS

Simplest (simplified?) answer:

If you parse a grammar, you usually maintain some kind of stack (directly or indirectly). For every opening bracket, you push an element to the stack. Upon seeing a closing bracket, you know if it matches the last opening bracket by inspecting the stack.

Also note that there are many way to open a bracket, "{" is just one of them. So the stack not only tells you how many open brackets you have, but also which type of closing bracket is legal in a given parse state.

Standard regular expressions can only express grammar for a regular languages, which is exactly the class of languages accepted by a deterministic finite automata. A DFA only have finite number states, while brackets can be nested indefinitely; a language that can potentially have infinite level of nesting are not a regular language and cannot be parsed only by regular expression.

While the regular expression libraries in most languages are not "just" standard regular expression and is capable of parsing some non-regular languages, they generally require overly complicated expression to do so.

Generally, to check for a well-formed nesting of bracket language, you will need a context-free grammar (CFG) parser. CFG is strictly stronger than regular expression (i.e. if a grammar can be expressed in RE, then it can be expression in CFG, the reverse is not necessarily true).

Good tools are Flex for Tokens and Bison for your gramar, if you using C

JFlex and Cup, if you want to do it with Java, also using Visitor paddern for better program structure

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top