Question

Becaue I have a really redundant config-file format.. I invented a custom script-format for writing loops, for example:

[Config Object]
{Loop 3
    Setting[i]  = Value[i]
}
OtherSetting=X

Which will result in:

[Config Object]
Setting1     = Value1
Setting2     = Value2
Setting3     = Value3
OtherSetting = X

My first idea was to use regular expressions, like this one:

!{(.*?)}!is

That worked really well until i tried to use it with nested loops - you surely know this "oh cr... moments"

Because the following:

1: [Config Object]
2: *{*Loop 3
3:    Section[i]
4:    {Loop 3
5:        Setting[i]    = Value[i]
6:     *}*
7: }
8: OtherSetting=X

Will lead the regex to cover the range between line 2 and line 6 (market them with *s)

And actually I really have no Idea how to solve this because the regex is logically doing right.

The ? Lazy-Operator is needed because without it I would have the same problem in the another direction and would not be able to write two following loops.

Little bit research made me clear that regex is not the right direction here, but I couldn't find any PHP-Solutions. So how may I performantly parse my "loop"-script in PHP getting for example an array with the loops and replacing the commands within the braces with the calculated results?

Was it helpful?

Solution

The proper solution is mentioned in the comments. You need to actually write a compiler/parser. My memory is a little fuzzy from my compilers course, but here is how you would approach it.

The basic concept is to convert the input to tokens (this is where regular expressions are okay). This is called lexical analysis

So:

[Config Object]
{Loop 3
   Section[i]
   {Loop 3
       Setting[i]    = Value[i]
   }
}
OtherSetting=X

becomes (pseudo code tokens, and maybe not exactly what you need)

OPEN_BRACKET STRING(=Config Object) CLOSE_BRACKET
START_LOOP NUMBER(=3)
   STRING(=Section) OPEN_BRACKET STRING(=i) CLOSE_BRACKET
   START_LOOP NUMBER(=3)
       STRING(=Setting) OPEN_BRACKET STRING(=i) CLOSE_BRACKET EQUAL STRING(=Value) OPEN_BRACKET STRING(=i) CLOSE_BRACKET
   END_LOOP
END_LOOP
STRING(=OtherSetting) EQUAL STRING(=X)

So if your lexer gets you an array of tokens like the above, you just need to parse it to an actual grammar (so this is where you don't want to use regular expressions).

Your grammar (for the loops) is something along these lines (pseudo code syntax kind of like Bison, and I'm probably forgetting parts/leaving things out on purpose):

INDEXED_CONFIG_LINES: INDEXED_CONFIG_LINE | INDEXED_CONFIG_LINES INDEXED_CONFIG_LINE;
INDEXED_CONFIG_LINE: STRING OPEN_BRACKET STRING CLOSE_BRACKET EQUAL STRING OPEN_BRACKET STRING CLOSE_BRACKET;
LOOP: START_LOOP NUMBER LOOP_BODY END_LOOP;
LOOP_BODY: INDEXED_CONFIG_LINES | LOOP;

So instead of a regular expression, you need a parser that can use that grammar to build a syntax tree. You would basically just be building a state machine, where you transition on the next token to some state (like in a loop body, etc.).

Honestly, YAML would probably meet your needs instead of re-inventing the wheel or resorting to regex gymnastics. But if you really need to have the loop syntax you are proposing, you could take a look at the Symfony Yaml component to see how they do the parsing. https://github.com/symfony/Yaml

Or you can take a look at Twig for another parser that does have loops: https://github.com/fabpot/Twig/tree/master/lib/Twig

OTHER TIPS

I find that when I have a whole bunch of variables that are related (like it seems you do), arrays are the way to go. Then you can skip the recursion and the parsing. Ex:

$cars=array("A","B","C");
echo $cars[0]; // echos "A"

Don't knock me for suggesting it, but couldn't you use an array in your config file? It'd be wayyy easier to parse too...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top