What is the difference between a KD-tree and a R-tree?

https://stackoverflow.com/questions/4326332

29-09-2019
|

Question

I looked at the definition of KD-tree and R-tree. It seems to me that they are almost the same.

What's the difference between a KD-tree and an R-tree?

Solution

R-trees and kd-trees are based on similar ideas (space partitioning based on axis-aligned regions), but the key differences are:

Nodes in kd-trees represent separating planes, whereas nodes in R-trees represent bounding boxes.
kd-trees partition the whole of space into regions whereas R-trees only partition the subset of space containing the points of interest.
kd-trees represent a disjoint partition (points belong to only one region) whereas the regions in an R-tree may overlap.

(There are lots of similar kinds of tree structures for partitioning space: quadtrees, BSP-trees, R*-trees, etc. etc.)

OTHER TIPS

They are actually quite different. They serve similar purpose (region queries on spatial data), and they both are trees, but that is about all they have in common.

R-Trees are balanced, kd-trees are not (unless bulk-loaded). This is why R-trees are preferred for changing data, as kd-trees may need to be rebuilt to re-optimize.
R-Trees are disk-oriented. They actually organize the data in areas that directly map to the on-disk representation. This makes them more useful in real databases and for out-of-memory operation. kd-trees are memory oriented and are non-trivial to put into disk pages
R-Trees do not cover the whole data space. Empty areas may be uncovered. kd-trees always cover the whole space.
kd-trees binary split the data space, r-trees partition the data into rectangles. The binary splits are obviously disjoint; while the rectangles of an r-tree may overlap (which actually is sometimes good, although one tries to minimize overlap)
kd-trees are a lot easier to implement in memory, which actually is their key benefit
R-trees can store rectangles and polygons, kd-trees only stores point vectors (as overlap is needed for polygons)
R-trees come with various optimization strategies, different splits, bulk-loaders, insertion and reinsertion strategies etc.
Kd-trees support squared Euclidean distance and Minkowski norms, while Rtrees have been shown to also support geodetic distance (for finding near points on geodata).

A major difference between the two not mentioned in this answer is that KD-trees are only efficient in bulk-loading situations. Once built, modifying or rebalancing a KD-tree is non-trivial. R-trees do not suffer from this.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow