R* Tree overlap computation

https://stackoverflow.com/questions/14346549

15-01-2022
|

Question

I was reading through this implementation of the R* Tree, and I noticed that they are calculating overlap differently from how the paper defines it.

In the paper, overlap is defined as such:

For a given node/rect k, compute the sum of area of the intersection between k and each sibling of k (not including k).

Overlap enlargement is then the delta of this value and what the overlap of the node k is if an item r is added to k.

Something like this:

childOverlapEnlargement(Node child, item r)
{
    childEnlarged = child.union(r);
    sum = 0;
    for(each sibling s of child which isn't node)
    {
        sum += area(childEnlarged.intersect(s)) - area(child.intersect(s));
    }
    return sum;
}

In the other implementation, they sort by the intersection area of a given node with the item being inserted. Something like this:

childOverlapEnlargement(Node node, item r)
{
    return area(node.intersect(r));
}

Obviously their implementation is computationally less intensive than the paper's definition. However, I can't find any obvious logic why the two computations should be equal.

So my questions are:

Do the two computations always end up with the same subtrees being picked? Why?
If they do result in different subtrees being picked, are the results better or close to as good as the paper's definition? Or was the choice made in error?

edit: re-read over their implementation and I realized they weren't comparing the intersection of two siblings, but the intersection of each potential leaf and the item being inserted. Strangely enough, they're picking the sibling which overlaps the least with the item being inserted. Wouldn't you want to insert into the node which overlaps the most with the item being inserted?

Solution

Maybe the implementation you are looking at has bugs or is incorrect. Nobody is perfect.

Note that the R*-tree tries to minimize overlap enlargement, not overlap itself.

Some overlap will likely be unavoidable. If there already is overlap, you cannot expect this to decreate when inserting additional rectangles. But you can try to at least not increase the amount of overlap.

As for performance considerations, check whether you need to actually compute the intersection rectangles. Try to instead of computing area(intersection()) to do a function intersectionSize(). This does make a difference. For example, if A.maxX = 1 and B.minX = 2 I can immediately give the intersection size of 0, without looking at any of the other dimensions.

Avoid eagerly precomputing all intersections etc. that you could need. Instead, compute only those that you actually need. Profile your code, and see if you can optimize the critical codepaths. There usually are some low hanging fruit there.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow