Question

Consider following script (it's total nonsense in pseudo-language):

if (Request.hostMatch("asfasfasf.com") && someString.existsIn(new String[] {"brr", "hrr"}))   {
    if (Requqest.clientIp("10.0.x.x")) {
        somevar = "1";
    }
    somevar = "2";
}
else {
    somevar = "first";
}
string foo = "foo";
// etc. etc.

How would you grab if-block's parameters and contents from it? The if-block has format of:

if<whitespace>(<parameters>)<whitespace>{<contents>}<anything>

I tried using String.split() with regex pattern of ^if\s*\(|\)\s*\{|\}\s* but this fails miserably. Namely, the problem is that ) { is found also in inner if-block and the closing } is found from many places as well. I don't think neither lazy or eager expansion works here.

So... any pointers to what might I need here in order to implement this with regex?

I also need to get the remaining string without the if-block's code (so code starting from else { ...). Using just String.split() seems to make it difficult as there is no information about the length of the parts that were parsed away.

I initially created a loop based solution (using String.substring() heavily) for this, but it's dull. I would like to have something fancier instead. Should I go with regex or create a custom, generic function (there are many other cases than just this) that takes the parseable String and the pattern instead (consider the if<whitespace>(... pattern above)?

Edit: Changed returns to variable assignments as it would have not made sense otherwise.

Was it helpful?

Solution

A regular language won't work because a regular grammar can't match things like "any number of open parenthesis followed by any number of close parenthesis". A context-free grammar would be needed for that.

Unless you use a context-free grammar parser for Java or a regular expression extension that makes regular expressions no longer regular, your loop-based solution is probably the fanciest solution.

OTHER TIPS

You'd be far better off using (or writing) a parser than trying to do this with Regex.

Regex is great for somethings, but for complex parsing like this, it sucks. Another example where it sucks that gets asked a lot here is parsing HTML - you can do it to a limited degree, but for anything complex, a DOM parser is a much better solution.

For a [very] simple parser, what you need is a recursive function that searches for a braces { and }, recursing down a level each time it comes across an opening brace, and returning back up a level when it finds a closing brace. It then needs to store the string contents between the two braces at each level.

As per the above, you'll need a parser. One type that's easy to implement (and fun to write!) is a recursive descent parser with backtracking. There is also a plethora of parser generators out there, though most of those have a learning curve. One Java-friendly parser generator is JavaCC.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top