Question

I'm working on an implementation of the R* tree based off of this paper. There are a few questions I have about the choose split axis algorithm.

The R*-tree uses the followmg method to find good splits. Along each axis, the entries are first sorted by the lower value, then sorted by the upper value of their rectangles.

What does it mean by the lower/upper value of the rectangles?

For each distribution, goodness values are determined. Depending on these goodness values the final distribution of the entries is determined. Three different goodness values and different approaches of using them in different combinations are tested experimentally.

(I) area-value area[bb(first group)] + area[bb(second group)]

(II) margin-value margin[bb(first group)] + margin[bb(second group)]

(III) overlap-value area[bb(first group) + bb(second group)]

Here bb denotes the bounding box of a set of rectangles

What does it mean by margin-value? How would I go about computing this value?

Was it helpful?

Solution

As far as I can tell, the "lower/upper values of a rectangle" are the minimum and maximum values of the rectangle along the axis in question.

According to p323 of the linked article, "Here the margin is the sum of the lengths of the edges of a rectangle".

OTHER TIPS

Rectangles are usually represented by pairs of min+max in each dimension. So the "upper" and "lower" values are the minimum and maximum.

The margin is the perimeter. The reason is that for many situations, squares are the preferrable type of rectangles. For example when you do Euclidean (or Manhattan, pretty much any Lp norm) nearest neighbor search. The reason is that they are to some extend "unbiased".

Other split strategies such as the "linear" split by Ang et Tan neglect this, and tend to produce very long and thing slices. Wikipedia has an example for this:

https://en.wikipedia.org/wiki/File:Zipcodes-Germany-AngTanSplit.svg

These are the kind of splits the R*-tree tries to avoid. Because most queries will intersect a lot of these slices, so you gain very little then.

Note that the R*-tree uses a number of heuristics and tie breakers. Furthermore, it does a two step decision: First it only chooses the axis to use for splitting. When the axis has been determined, it uses actually a different logic to choose a split along this axis.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top