Question

I'm trying to convert an old HTML Site to a new CMS. To get the correct menu hierachy (with varying depth) I want to read all the files with PHP and extract/parse the menu (nested unordered lists) into an associative array

root.html
<ul id="menu">
  <li class="active">Start</li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <li><a href="file2.html">Sub2</a></li>
  </ul>
</ul>

file1.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li class="active">Sub1</li>
    <ul>
      <li><a href="file3.html">SubSub1</a></li>
      <li><a href="file4.html">SubSub2</a></li>
      <li><a href="file5.html">SubSub3</a></li>
      <li><a href="file6.html">SubSub4</a></li>
    </ul>
  </ul>
</ul>

file3.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <ul>
      <li class="active">SubSub1</li>
      <ul>
        <li><a href="file7.html">SubSubSub1</a></li>
        <li><a href="file8.html">SubSubSub2</a></li>
        <li><a href="file9.html">SubSubSub3</a></li>
      </ul>
    </ul>
  </ul>
</ul>

file4.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <ul>
      <li><a href="file3.html">SubSub1</a></li>
      <li class="active">SubSub2</li>
      <li><a href="file5.html">SubSub3</a></li>
      <li><a href="file6.html">SubSub4</a></li>
    </ul>
  </ul>
</ul>

I would like to loop through all files, extract 'id="menu"' and create an array like this (or similar) while keeping the hierarchy and file information

Array 
  [file] => root.html
  [child] => Array 
    [Sub1] => Array 
      [file] => file1.html
      [child] => Array  
        [SubSub1] => Array 
          [file] => file3.html
          [child] => Array 
            [SubSubSub1] => Array 
              [file] => file7.html
            [SubSubSub2] => Array 
              [file] => file8.html                      
            [SubSubSub3] => Array
              [file] => file9.html
        [SubSub2] => Array
          [file] => file4.html
        [SubSub3] => Array 
          [file] => file5.html
        [SubSub4] => Array 
          [file] => file6.html
    [Sub2] => Array
      [file] => file2.html 

With the help of the PHP Simple HTML DOM Parser libray I successfully read the file and extracted the menu

$html = file_get_html($file);
foreach ($html->find("ul[id=menu]") as $ul) {
  ..
}

To only parse the active section of the menu (leaving out the links to got 1 or more levels up) I used

$ul->find("ul",-1)

which finds the last ul inside the outer ul. This works great for a single file.

But I'm having trouble to loop through all the files/menus and keep the parent/child information because each menu has a different depth.

Thanks for all suggestions, tips and help!

Was it helpful?

Solution

Edit: OK, this was not so easy after all :)

By the way, this library is really an excellent tool. Kudos to the guys who wrote it.

Here is one possible solution:

class menu_parse {

    static $missing = array(); // list of missing files

    static private $files = array(); // list of source files to process

    // initiate menu parsing
    static function start ($file)
    {
        // start with root file
        self::$files[$file] = 1;

        // parse all source files
        for ($res=array(); current(self::$files); next(self::$files))
        {
            // get next file name
            $file = key(self::$files);

            // parse the file
            if (!file_exists ($file))
            {
                self::$missing[$file] = 1;
                continue;
            }
            $html = file_get_html ($file);

            // get menu root (if any)
            $root = $html->find("ul[id=menu]",0);
            if ($root) self::menu ($root, $res);
        }

        // reorder missing files array
        self::$missing = array_keys (self::$missing);

        // that's all folks
        return $res;
    }

    // parse a menu at a given level
    static private function menu ($menu, &$res)
    {
        foreach ($menu->children as $elem)
        {
            switch ($elem->tag)
            {
            case "li" : // name and possibly source file of a menu

                // grab menu name
                $name = $elem->plaintext;

                // see if we can find a link to the menu file
                $link = $elem->children(0);
                if ($link && $link->tag == 'a')
                {
                    // found the link
                    $file = $link->href;
                    $res[$name]->file = $file;

                    // add the source file to the processing list
                    self::$files[$file] = 1;
                }
                break;

            case "ul" : // go down one level to grab items of the current menu
                self::menu ($elem, $res[$name]->childs);
            }   
        }
    }
}

Usage:

// The result will be an array of menus indexed by item names.
//
// Each menu will be an object with 2 members
// - file   -> source file of the menu
// - childs -> array of menu subtitems
//
$res = menu_parse::start ("root.html");

// parse_menu::$missing will contain all the missing files names

echo "Result : <pre>";
print_r ($res);
echo "</pre><br>missing files:<pre>";
print_r (menu_parse::$missing);
echo "</pre>";

Ouput of your test case:

Array
(
  [Start] => stdClass Object
    (
      [childs] => Array
        (
          [Sub1] => stdClass Object
            (
              [file] => file1.html
              [childs] => Array
                (
                  [SubSub1] => stdClass Object
                    (
                      [file] => file3.html
                      [childs] => Array
                        (
                          [SubSubSub1] => stdClass Object
                            (
                              [file] => file7.html
                            )
                          [SubSubSub2] => stdClass Object
                            (
                              [file] => file8.html
                            )
                          [SubSubSub3] => stdClass Object
                            (
                              [file] => file9.html
                            )
                        )
                    )
                  [SubSub2] => stdClass Object
                    (
                      [file] => file3.html
                    )
                  [SubSub3] => stdClass Object
                    (
                      [file] => file5.html
                    )
                  [SubSub4] => stdClass Object
                    (
                      [file] => file6.html
                    )
                )
            )
          [Sub2] => stdClass Object
            (
              [file] => file2.html
            )
        )
      [file] => root.html
    )
)

missing files: Array
(
    [0] => file2.html
    [1] => file5.html
    [2] => file6.html
    [3] => file7.html
    [4] => file8.html
    [5] => file9.html
)

Notes:

  • The code assumes all item names are unique inside a given menu.

You could modify the code to have the (sub)menus as an array with numeric indexes and names as properties (so that two items with the same name would not overwrite each other), but that would complicate the structure of the result.

Should such name duplication occur, the best solution would be to rename one of the items, IMHO.

  • The code also assume there is only one root menu.

It could be modified to handle more than one, but that does not make much sense IMHO (it would mean a root menu ID duplication, which would likely cause trouble to the JavaScript trying to process it in the first place).

OTHER TIPS

This is more like a directory tree with upward links. file1 on level 1 points to file3 on level 2, and this points back to file 1 on level 1 which causes the "different depth". Consider of setting up a particular menu-object pointing upwards and downwards and keeping lists of that instead of arrays of arrays of strings. Starting point for such a hierarchie in php could be a class like this:

class menuItem {

    protected $leftSibling = null;
    protected $rightSibling = null;

    protected $parents = array();
    protected $childs = array();

    protected properties = array();

    // set property like menu name or file name
    function setProp($name, $val) {
        $this->properties[$name] = $val;
    }

    // get a propertue if set, false  otherwise
    function getProp($name) {
        if ( isset($this->properties[$name]) )
            return $this->properties[$name];
        return false;
    }

    function getLeftSiblingsAsArray() {
        $sibling = $this->getLeftSibling();
        $siblings = array();
        while ( $sibling != null ) {
            $siblings[] = $sibling;
            $sibling = $sibling->getLeftSibling();
        }
        return $siblings;
    }

    function addChild($item) {
        $this->childs[] = $item;
    }

    function addLeftSibling($item) {
        $sibling = $this->leftSibling;
        while ( $sibling != null ) {
            if ( $sibling->hasLeft() )
                $sibling = $sibling->getLeftSibling();
            else {
                $sibling->addFinalLeft($item);
                break;
            }
        }
    }

    function addFinalLeft(item) {
        $sibling->leftSibling = $item;
    }

    ....
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top