Question

I'm wondering about formal languages. I have a kind of parser : It reads à xml-like serialized tree structure and turn it into a multidimmensionnal array.

My point is on the similarities between the algorithm being used and the differents kinds of automatons ( state machines turing machines stack ... ).

So the question is : which is the automaton I implictly use here, and to which formal languages family does it fit ? And what's about recursion ?

What i mean by " automaton i use implicitly " is "which is the minimal automaton to do the same job".

Here is the complete source :

$words; // an array of XML tag '<tag>', '</tag>' and simple text content

$tree = array( 'type' => 'root', 'sub' => array() );

$pTree = array(&$tree);

$deep = 0;

foreach ( $words as $elem ) {

if ( preg_match($openTag, $elem) ) { // $elem is an open tag

    $pTree[$deep++]['sub'][] = array( // we add an element to the multidim array
        'type' => 'block',
        'content' => $elem,
        'sub' => array()
    );

    $size = sizeof($pTree[$deep - 1]['sub']);
    $pTree[$deep] = &$pTree[$deep - 1]['sub'][$size - 1]; // down one level in the tree

} elseif ( preg_match($closeTag, $elem) ) { // it is a close tag

    $deep--; // up in the tree 

} else { // simple element

    $pTree[$deep]['sub'][] = array(
        'type' => 'simple',
        'content' => $elem
    );

}

}

Was it helpful?

Solution

Please take a look at your question again. You're referring to a $words variable, which is not in your example. Also, there is no code, without knowing what is being done it's hard to answer you.

Judging from the name of the variable $deep, it is probably not the state. The state in an automaton is an element of a set that is specific to the automaton; $deep looks like it could contain a depth, any positive integer. Again, hard to tell without the code.

Anyway, you are probably not "implicitly using" any automaton at all, if you didn't design your code as an implementation of one.

Your simple xml-like files could probably be recognized by a deterministic stack machine, or generated by a deterministic context-free grammar, making them Type-2 in the Chomsky hierarchy. Once again this is just a guess, "a xml-like serialized tree structure" is too vague for any kind of formalism.

In short, if you are looking to use any formal theory, do word your questions more formally.


Edit (after seeing the code):

You're building a tree. That's out of reach for an automaton (at least the “standard” ones). Finite automatons only work with an input and a state, stack machines add a stack to that, and Turing machines have a read-write tape they can move in both directions.

The “output” of an automaton is a simple “Yes” (accepted) or “No” (not accepted, or an infinite loop). (Turing machines can be defined to provide more output on their tape.) The best I can answer to “which is the minimal automaton to do the same job” is that your language can be accepted by a stack machine; but it would work very differently and not give you trees.

However, you might look into grammars – another formal language construct that introduces the concept of parse trees. What you are doing here is creating such a parse tree with a top-down parser.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top