R-Trees for Indexing Multidimensional Data

An R-tree is a data structure used to index multidimensional data in database systems for spatial queries. Branches in an R-tree maintain a minimum bounding rectangle of all the child branches and nodes. Queries against the R-tree traverse the tree by performing relatively inexpensive intersection operations against the minimum bounding rectangles. On the top, you can see the data and the minimum bounding rectangles containing the data. On the bottom, you can see the tree itself. Clicking anywhere in the panel at the top adds new data points to the tree. Hover over a node in the tree to see its label. Clicking a branch in the tree highlights the minimum bounding rectangle to which it corresponds.

SNAPSHOTS

DETAILS

The R-tree was first proposed by Guttman in 1984. It is used in many spatial database systems, including PostGIS and JTS, to efficiently index and query multidimensional data. R-trees create a hierarchical decomposition of the data space that minimizes the area of rectangles needed to group the data. Each branch in an R-tree maintains a minimum bounding rectangle for all of the children of that branch, including sub-branches and data elements. Each node in an R-tree has a configurable maximum number of elements. The insertion algorithm requires that nodes be split when they are full. There are multiple optimal splitting techniques. The one used in this Demonstration is the quadratic cost splitting criterion, which trades off speed for optimality. A split could cause splits to propagate up the tree. When this happens, the tree grows at the root to accommodate the new data. R-trees are particularly suited for computing results for nearest-neighbor queries. As such, they are effective for performing rough clustering of multidimensional data in a machine-learning algorithm.