BibTeX

Years of Citing Articles

Bookmark

OpenURL

Abstract

This paper describes PBSM (Partition Based Spatial--Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a situation could arise if both inputs to the join are intermediate results in a complex query, or in a parallel environment where the inputs must be dynamically redistributed. The PBSM algorithm partitions the inputs into manageable chunks, and joins them using a computational geometry based plane--sweeping technique. This paper also presents a performance study comparing the the traditional indexed nested loops join algorithm, a spatial join algorithm based on joining spatial indices, and the PBSM algorithm. These comparisons are based on complete implementations of these algorithms in Paradise, a database system for handling GIS applications. Using real data sets, the performance study examines the behavior of these spatial join algorithms in a vari...

Citations

...e PBSM algorithm. The performance study is based on actual implementations of the three algorithms in Paradise [DKL + 94], which is an experimental GIS database system. Using real data from the TIGER =-=[Tig]-=- and the Sequoia [SFGM93] data sets, the study examines the behavior of the algorithms in a variety of situations, including the cases when none, one, or both the inputs to the join have a suitable in...

..., and the seeded trees are joined using the tree join algorithm of [BKS93]. The problem of finding pairwise intersection between two sets of rectangles has been extensively studied in the VLSI domain =-=[MC80]-=-, and numerous solutions exist for the case when both the input set of rectangles fit in memory [PS88]. In [GS87], Guting and Shilling examine the rectangle intersection problem when the inputs are to...

...results of a spatial join. The algorithm for building the spatial join index requires grid files for indexing the spatial data, and uses these grid files to compute the spatial join index. Grid files =-=[NHS84]-=- and kd--trees [Ben75, Ben79] have also been employed for evaluating multi--attribute joins in the relational domain [KHT89, HNKT90, BHF93]. These methods can also be used for evaluating the filter st...

...lectivities generalization trees are more efficient. The proposed join algorithm using the generalization trees, is similar to the join algorithm on R--trees proposed by Brinkhoff, Kriegel and Seeger =-=[BKS93]-=-. This algorithm can be used only if an R--tree index exists on both the join inputs, and can be described as a synchronized depth--first search of both indices, with the two depth--first searches bei...

...id increases the efficiency of the filtering technique, but it also increases the space requirement since a larger number of z--values are required to approximate an object. In the relational domain, =-=[Val87]-=- proposed the use of join indices to improve the performance of the relational join operator. Drawing an analogy from this, Rotem [Rot91] proposed a spatial join index that partially precomputes the r...

... large inputs, can also be used for declustering spatial data. We are currently examining these issues in the broader context of extending Paradise [DKL + 94] to run on shared-- nothing architectures =-=[Sto86]-=-. Parallel spatial databases are emerging as an attractive solution for storing and manipulating large volumes of spatial data [DLPY93], and some techniques for declustering spatial data have recently...

...--values, are then used in a spatial join algorithm that merges two sequences of z--values. The z--values, being 1--dimensional values, can be stored in traditional indexing structures like a B--tree =-=[OM84]-=-. The performance of the spatial join algorithm using z--values was found to be sensitive to the choice of the grid [Ore89]. Choosing a fine grid increases the efficiency of the filtering technique, b...

...gure 14: Comparison of the Join Algorithms with indices, TIGER Data (Join Road with Hydrography). Figure 15: Comparison of the Join Algorithms with indices, TIGER Data (Join Road with Rail). the join =-=[BKSS94]-=- (by an order of magnitude in many cases). These techniques rely on using as a filter in the refinement step, extra information that is precomputed and stored along with each spatial feature. As an ex...

...performance study is based on actual implementations of the three algorithms in Paradise [DKL + 94], which is an experimental GIS database system. Using real data from the TIGER [Tig] and the Sequoia =-=[SFGM93]-=- data sets, the study examines the behavior of the algorithms in a variety of situations, including the cases when none, one, or both the inputs to the join have a suitable index. The study also inves...

...Partitioning Function using Tiles. The spatial partitioning function just described is the spatial analog of virtual processor round robin partitioning for handling skews in parallel relational joins =-=[DNSS92]-=-. A similar partitioning function has been independently proposed for redundancy--baseddeclustering of spatial objects in a parallel spatial database [TY95], but in that proposal the number of tiles a...

...uad tree, and compare the efficiency of variants of the PMR quad tree with variants of the R--tree [HS95]. When one of the inputs to the spatial join does not have a spatial index, Lo and Ravishankar =-=[LR94]-=- propose building a seeded tree index on that input. A seeded tree is a R--tree that is allowed to be height unbalanced. The algorithm for constructing the seeded tree uses the existing index on one o...

...l models, Gunther compares join algorithms that use generalization trees (which is a class of tree structures that includes the R-tree, R*-tree and R+tree) with the nested loops join and join indices =-=[Gun93]-=-. This study concludes that for low join selectivities, join indices usually provide the best join performance, but for higher join selectivities generalization trees are more efficient. The proposed ...

...ional values, can be stored in traditional indexing structures like a B--tree [OM84]. The performance of the spatial join algorithm using z--values was found to be sensitive to the choice of the grid =-=[Ore89]-=-. Choosing a fine grid increases the efficiency of the filtering technique, but it also increases the space requirement since a larger number of z--values are required to approximate an object. In the...

...e required to approximate an object. In the relational domain, [Val87] proposed the use of join indices to improve the performance of the relational join operator. Drawing an analogy from this, Rotem =-=[Rot91]-=- proposed a spatial join index that partially precomputes the results of a spatial join. The algorithm for building the spatial join index requires grid files for indexing the spatial data, and uses t...

... be used for evaluating the filter step by storing the bounding box of the spatial objects as points in a higher dimension [BHF93]. Recently, spatial index structures like R--trees [Gut84], R+--trees =-=[CFR87]-=-, R*--trees [BKSS90], and PMR quad trees [NS86] have been used to speed up the evaluation of the spatial join. Using analytical models, Gunther compares join algorithms that use generalization trees (...

...ng the bounding box of the spatial objects as points in a higher dimension [BHF93]. Recently, spatial index structures like R--trees [Gut84], R+--trees [CFR87], R*--trees [BKSS90], and PMR quad trees =-=[NS86]-=- have been used to speed up the evaluation of the spatial join. Using analytical models, Gunther compares join algorithms that use generalization trees (which is a class of tree structures that includ...

...object representing a swiss-- cheese--polygon might require thousands of points to represent the exact geometric shape), spatial operations, including the spatial join, typically operate in two steps =-=[Ore90]-=-: ffl Filter Step: In this step, an approximation of each spatial object, such as its minimum bounding rectangle, is used to eliminate tuples that cannot be part of the result. This step produces cand...

...cally repartition the overflown partition pair. Another alternative is to increase the number of partitions (limited to M ) and using schemes similar to those used by the Adaptive Hash join algorithm =-=[ZG90]-=-. However, the current implementation of PBSM does not incorporate any of these techniques. 4 Performance Evaluation In this section, we compare the PBSM join algorithm with two other spatial join alg...

...se methods can also be used for evaluating the filter step by storing the bounding box of the spatial objects as points in a higher dimension [BHF93]. Recently, spatial index structures like R--trees =-=[Gut84]-=-, R+--trees [CFR87], R*--trees [BKSS90], and PMR quad trees [NS86] have been used to speed up the evaluation of the spatial join. Using analytical models, Gunther compares join algorithms that use gen...

..., HS95] ffl PBSM two dimensional space ffl Build 1 or 2 indices before joining [LR94, LR95] ffl Spatial Hash Join [LR96] Table 1: Classification of Various Spatial Join Algorithms data structures. In =-=[HS95]-=-, Hoel and Samet propose a tree join algorithm for the PMR quad tree, and compare the efficiency of variants of the PMR quad tree with variants of the R--tree [HS95]. When one of the inputs to the spa...

...joins in the relational domain [KHT89, HNKT90, BHF93]. These methods can also be used for evaluating the filter step by storing the bounding box of the spatial objects as points in a higher dimension =-=[BHF93]-=-. Recently, spatial index structures like R--trees [Gut84], R+--trees [CFR87], R*--trees [BKSS90], and PMR quad trees [NS86] have been used to speed up the evaluation of the spatial join. Using analyt...

...se system has been employed to meet these requirements. Examples of commercial database systems that have been used for these applications are ARC/INFO [Arc95], Intergraph's MGE [Cor95], and Illustra =-=[Ube94]-=-). Data stored in these spatial database systems includes simple geometric types like points, lines, polygons, and surfaces, This work was partially supported by NASA Contracts #USRA--555517, #NAGW--3...

...s have been dynamically redistributed. A spatial DBMS must evaluate these joins efficiently. One solution to this problem is to build a spatial index on both inputs and then use a tree join algorithm =-=[LR95]-=-. Another solution to this problem comes from the VLSI domain where one needs to compute the pairwise intersection between two potentially large sets of rectangles that don't fit entirely in main memo...

...ndling skews in parallel relational joins [DNSS92]. A similar partitioning function has been independently proposed for redundancy--baseddeclustering of spatial objects in a parallel spatial database =-=[TY95]-=-, but in that proposal the number of tiles always equals the number of partitions. The design space for choosing the spatial partitioning function has two axes: the number of tiles used in the partiti...

...nother solution to this problem comes from the VLSI domain where one needs to compute the pairwise intersection between two potentially large sets of rectangles that don't fit entirely in main memory =-=[GS87]-=-. However, the VLSI algorithms are generally not very efficient with respect to the number of disk I/Os. This paper makes two contributions. First, it presents a new spatial join algorithm, called the...

...rwise intersection between two sets of rectangles has been extensively studied in the VLSI domain [MC80], and numerous solutions exist for the case when both the input set of rectangles fit in memory =-=[PS88]-=-. In [GS87], Guting and Shilling examine the rectangle intersection problem when the inputs are too large to fit in memory, and analyze the time and space complexity of two algorithms that are based o...

...ding Paradise [DKL + 94] to run on shared-- nothing architectures [Sto86]. Parallel spatial databases are emerging as an attractive solution for storing and manipulating large volumes of spatial data =-=[DLPY93]-=-, and some techniques for declustering spatial data have recently been proposed [TY95]. However, unless the spatial data is uniformly distributed, these techniques can result in unbalanced partitions....

... manipulate spatial data. Increasingly, a database system has been employed to meet these requirements. Examples of commercial database systems that have been used for these applications are ARC/INFO =-=[Arc95]-=-, Intergraph's MGE [Cor95], and Illustra [Ube94]). Data stored in these spatial database systems includes simple geometric types like points, lines, polygons, and surfaces, This work was partially sup...

...nother solution to this problem comes from the VLSI domain where one needs to compute the pairwise intersection between two potentially large sets of rectangles that don't fit entirely in main memory =-=[GS87]-=-. However, the VLSI algorithms are generally not very efficient with respect to the number of disk I/Os. This paper makes two contributions. First, it presents a new spatial join algorithm, called the...

...Increasingly, a database system has been employed to meet these requirements. Examples of commercial database systems that have been used for these applications are ARC/INFO [Arc95], Intergraph's MGE =-=[Cor95]-=-, and Illustra [Ube94]). Data stored in these spatial database systems includes simple geometric types like points, lines, polygons, and surfaces, This work was partially supported by NASA Contracts #...

... number of tiles on the execution time of PBSM, but found that changing the number of tiles had a very small effect on the overall execution time (less than 5%). The full length version of this paper =-=[PD]-=- presents this result. The performance study was carried out in two parts. The first part examined the performance of the three algorithms when neither join input had a pre--existing index, and the se...