Pergunta

I would like to understand what would be the optimal method of finding minimum tree coverage of tree nodes. Let me explain.

I have a self-referencing structure that represents a tree, with a limited depth of X.

All tree nodes

Nodes in the tree can be logically "selected". From the application perspective, it means that user would like to have some aggregate information about the selection.

Let's say user picks nodes A, D, I, J, L and M.

Selected tree nodes

What I would like to do is to be able to restructure the users selection in order to pick the minimum set of nodes that cover the entire selection.

For this example:

  • nodes I and J can be covered by their common parent node F, so I pick F and remove I and J
  • node M can be covered by it's parent node H, so I pick H and remove M
  • node D is already covered by node A, so I remove D

After that, nothing can be restructured further - algorithm stops and I should get the following selection.

Minimum coverage

First, I don't know if "minimum coverage" is the good name. Second, I can't find any better phrase, so Google is not my best friend here.

Another thing to note is that the tree itself can be stored:

  • in-memory
  • in transactional database
  • in OLAP database (not exactly self-referencing any more)

I'm stuck :(

EDIT

I have been thinking a lot about this problem, and there might be a solution that I have to analyse.

Following can be introduced:

  • let's call tree itself the "template tree"
  • let's call selection the "selection tree"
  • each node of the template tree has a precalculated redundant attribute - number of children nodes
  • the selection tree can be "dirty" or "clean"
    • after a node is added, if nothing is done yet, it's in the dirty state
    • after the restructuring is done, it's in the clean state
  • each node of the selection tree has redundant attributes - number of children nodes selected and state: selected or ghost
  • selection tree is a variant of in-memory doubly linked tree (child knows its parent, parent knows its children)

After that, adding a node in the selection tree (selection-node) would work like this. Let's call the new parent of the new node in selection tree a selection-parent, and parent of the new node in template tree a template-parent. Let's call the children of the selection tree node selection-children, and children of the template tree node template-children.

  1. Set dirty state of selection tree
  2. Remove all selection-children of selection-node
  3. If unexistent in selection tree, fetch template-parent of selection-node, along with its number of children attribute; add selection-parent as a ghost node, with selected number of children = 1
  4. If existent in selection tree, access selection-parent of selection-node and increment its selected number of children
  5. If selection-parent has selected number of children equal to number of template-children, promote its state from ghost to selected; treat that selection node as a new node and start at 1
  6. Set clean state of selection tree

I'm curious whether this algorithm already exists as an implementation somewhere. It's behavior wouldn't obviously be worse than O(log(n)), right? Of course, if it doesn't lack something in the logic.

Foi útil?

Solução

The effect you want can be achieved with a single pass over the tree, using a depth-first iteration.

When visiting a node, you perform two conditional actions

  • if all children of the node are selected, then you mark the current node as selected.
  • if the current node is selected, then you deselect all children of the node

In your example, these tests let the selection of the nodes I, J and M bubble up to F and H respectively and the second test causes node D to be deselected.

This algorithm does not make any assumptions on how the tree is stored, but only assumes that you can reach each node and that you can process the children of node X before node X itself.


If the process is interactive, the process can be even simpler:

  1. Check if any (direct or indirect) parent of the selected node is already selected. If so, deselect the node again.
  2. If the node is still selected, go over the tree under the node and deselect all (direct and indirect) children of the node.
  3. If the node and all its siblings are selected, then select the parent and deselect the parent's children. Repeat this step until you reach a node that doesn't have all siblings selected.

This is generally more efficient, because on each selection you usually only have to walk a small part of the tree.

Licenciado em: CC-BY-SA com atribuição
scroll top