Question

rephrase...

I'd like to know how to best to parse functions/conditionals. so if you have something like: [if {a} is {12 or 34}][if {b} not {55}] show +c+ [/if][/if] which is a conditional inside a conditional. Looks like I can't do this with regex only.


original question

for now I have a pretty simple way of parsing out some commands through actionscript.

I'm using regexp to find tags, commands and operands using...

+key_word+  // any text surrounded by +
[ifempty +val_1+]+val_2+[/ifempty]  //simple conditional
[ifisnot={`true,yes`} +ShowTitle+]+val_3+[/ifisnot]  // conditional with operands

my current algorithm matches the opening tag[**] with the first closing tag [/**] even though it doesn't match. Which means that I could not do something like [ifempty +val_2+][ifnotempty +val_2]+val_3+[/ifnotempty]+val_4+[/ifempty] - essentially putting one conditional inside another one.

I'm using an inline way of parsing that splits the string into an array of strings based on this regexp \[[^\/](?:[^\]])*\](?:[^\]])*\[\/(?:[^\]])*\]

can anyone suggest a more robust algorithm with a more robust parsing convention/standard? especially for as3.

Was it helpful?

Solution

Regular expressions define Regular Languages. Regular Languages cannot have regions of constrained, but potentially infinite, recursion.

One way of thinking about it is that all Regular Languages can be represented by a Finite State Machine. You would need a state for every possible number of if's, but the machine must be 'finite', so your in a bind. A classic example is:

a{n}b{n}, n >= 0
(meaning n a's, followed by n b's)

As you parse each a, you would need to go to another state (FSMs have no memory beyond the state their in, that's the only way they could remember n to match it later). To parse any number of n's, you would need an infinite number of states.

This is the same situation you're in, a regular expression could express a finite number of ifs (although it would take quite a bit of copy-pasting), but not an infinite number. Note however that some regular expression implementations cheat a bit, giving them more power than their mathematical equivalents.

In any case, your best bet is to use a more powerful parsing method. A recursive descent parser is particularly fun to implement, and could easily do what you need. You could also look into an LR-parser, or build a simple parser using a stack. Depending on your language, you might be able to find a parsing library such as pyparse for Python or Boost Spirit for C++.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top