Abstract
We compare two different techniques for browsing through a collection of spatial objects stored in an R-tree spatial data structure on the basis of their distances from an arbitrary spatial query object. The conventional approach is one that makes use of a k-nearest neighbor algorithm where k is known prior to the invocation of the algorithm. Thus if m < k neighbors are needed, the k-nearest neighbor algorithm has to be reinvoked for m neighbors, thereby possibly performing some redundant computations. The second approach is incremental in the sense that having obtained the k nearest neighbors, the k + 1st neighbor can be obtained without having to calculate the k + 1 nearest neighbors from scratch. The incremental approach is useful when processing complex queries where one of the conditions involves spatial proximity (e.g., the nearest city to Chicago with population greater than a million), in which case a query engine can make use of a pipelined strategy. We present a general incremental nearest neighbor algorithm that is applicable to a large class of hierarchical spatial data structures. This algorithm is adapted to the R-tree and its performance is compared to an existing k-nearest neighbor algorithm for R-trees [Rousseopoulos et al. 1995]. Experiments show that the incremental nearest neighbor algorithm significantly outperforms the k-nearest neighbor algorithm for distance browsing queries in a spatial database that uses the R-tree as a spatial index. Moreover, the incremental nearest neighbor algorithm usually outperforms the k-nearest neighber algorithm when applied to the k-nearest neighbor problem for the R-tree, although the improvement is not nearly as large as for distance browsing queries. In fact, we prove informally that at any step in its execution the incremental nearest neighbor algorithm is optimal with respect to the spatial data structure that is employed. Furthermore, based on some simplifying assumptions, we prove that in two dimensions the number of distance computations and leaf nodes accesses made by the algorithm for finding k neighbors is O(k + k).
- AOKI, P. M. 1998. Generalizing "search" in generalized search trees. In Proceedings of the 14th International Conference on Data Engineering (Orlando, FL, Feb.). IEEE Computer Society Press, Los Alamitos, CA, 380-389. Google ScholarDigital Library
- AREF, W. G. AND SAMET, H. 1992. Uniquely reporting spatial objects: Yet another operation for comparing spatial data structures. In Proceedings of the 5th Symposium on Spatial Data Handling (Charleston, SC, Aug.). 178-189.Google Scholar
- AREF, W. G. AND SAMET, H. 1993. Estimating selectivity factors of spatial operations. In Optimization in Databases - Proceedings of the 5th International Workshop on Foundations of Models and Languages for Data and Objects (Aigen, Austria, Sept.). 31-40.Google Scholar
- ARYA, S., MOUNT, D. M., NETANYAHU, N. S., SILVERMAN, R., AND WU, A. Y. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 6, 891-923. Google ScholarDigital Library
- BECKER, L. AND G TING, R. H. 1992. Rule-based optimization and query processing in an extensible geometric database system. ACM Trans. Database Syst. 17, 2 (June 1992), 247-303. Google ScholarDigital Library
- BECKMANN, N., KRIEGEL, H.-P., SCHNEIDER, R., AND SEEGER, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90, Atlantic City, NJ, May 23-25, 1990), H. Garcia-Molina, Ed. ACM Press, New York, NY, 322-331. Google ScholarDigital Library
- BENTLEY, g. L. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (Sept.), 509-517. Google ScholarDigital Library
- BERCHTOLD, S., B HM, C., KEIM, D. A., AND KRIEGEL, H.-P. 1997. A cost model for nearest neighbor search in high-dimensional data space. In Proceedings of the 16th ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '97, Tucson, AZ, May 12-14, 1997), A. Mendelzon and Z.M. zsoyoglu, Eds. ACM Press, New York, NY, 78-86. Google ScholarDigital Library
- BERCHTOLD, S., KEIM, D. A., AND KRIEGEL, H.-P. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96, Mumbai, India, Sept.). 28-39. Google ScholarDigital Library
- BERN, M. 1993. Approximate closest-point queries in high dimensions. Inf. Process. Lett. 45, 2 (Feb. 26, 1993), 95-99. Google ScholarDigital Library
- BRIN, S. 1995. Near neighbor search in large metric space. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95, Zurich, Sept.). 574-584. Google ScholarDigital Library
- BRODER, A. J. 1990. Strategies for efficient incremental nearest neighbor search. Pattern Recogn. 23, 1/2 (Jan. 1990), 171-178. Google ScholarDigital Library
- BURKHARD, W. A. AND KELLER, R. 1973. Some approaches to best-match file searching. Commun. ACM 16, 4 (Apr.), 230-236. Google ScholarDigital Library
- CIACCIA, P., PATELLA, M., AND ZEZULA, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97, Athens, Greece, Aug.). 426-435. Google ScholarDigital Library
- COMER, D. 1979. The ubiquitous B-tree. ACM Comput. Surv. 11, 2 (June), 121-137. Google ScholarDigital Library
- EASTMAN, C. M. AND ZEMANKOVA, M. 1982. Partially specified nearest neighbor searches using k-d-trees. Inf. Process. Lett. 15, 2 (Sept.), 53-56.Google ScholarCross Ref
- ESPERAN~A, C. AND SAMET, H. 1997. Orthogonal polygons as bounding structures in filter-refine query processing strategies. In Proceedings of the Fifth International Symposium on Advances in Spatial Databases (SSD'97, Berlin, July), M. Scholl and A. Voisard, Eds. Springer-Verlag, New York, 197-220. Google ScholarDigital Library
- FALOUTSOS, C. AND LIN, K. 1995. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the ACM SIGMOD Conference on Management of Data (San Jose, CA, May). ACM Press, New York, NY, 163-174. Google ScholarDigital Library
- FRANK, A. U. AND BARRERA, R. 1989. The Fieldtree: a data structure for geographic information systems. In Proceedings of the First Symposium on Design and Implementation of Large Spatial Databases (SSD'89, Santa Barbara, CA, July), A. Buchmann, O. G nther, T. R. Smith, and Y. F. Wang, Eds. Springer-Verlag, New York, 29-44. Google ScholarDigital Library
- FREDMAN, M. L., SEDGEWICK, R., SLEATOR, D. D., AND TARJAN, R. E. 1986. The pairing heap: a new form of self-adjusting heap. Algorithmica 1, 1 (Jan. 1986), 111-129. Google ScholarDigital Library
- FRIEDMAN, J. H., BENTLEY, J. L., AND FINKEL,, R.A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 3 (Sept.), 209-226. Google ScholarDigital Library
- FUKUNAGA, K. AND NARENDRA, P. M. 1975. A branch and bound algorithm for computing. IEEE Trans. Comput. 24, 7 (July), 750-753.Google Scholar
- G NTHER, O. AND NOLTEMEIER, H. 1991. Spatial database indices for large extended objects. In Proceedings of the Seventh International Conference on Data Engineering (Kobe, Japan). IEEE Computer Society Press, Los Alamitos, CA, 520-526. Google ScholarDigital Library
- GUTTMAN, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD Annual Meeting on Management of Data (SIGMOD '84, Boston, MA, June18-21). ACM, New York, NY, 47-57. Google ScholarDigital Library
- HAFNER, J., SAWHNEY, H., AND EQUITZ, W. E. AL. 1995. Efficient color histogram indexing for quadratic form distance functions. IEEE Trans. Pattern Anal. Mach. Intell. 17, 7 (July), 729-736. Google ScholarDigital Library
- HENRICH, A. 1994. A distance-scan algorithm for spatial access structures. In Proceedings of the Second ACM Workshop on Geographic Information Systems (Gaithersburg, MD, Dec.). ACM Press, New York, NY, 136-143.Google Scholar
- HENRICH, A. 1998. The LSDh-tree: an access structure for feature vectors. In Proceedings of the 14th International Conference on Data Engineering (Orlando, FL, Feb.). IEEE Computer Society Press, Los Alamitos, CA, 362-369. Google ScholarDigital Library
- HENRICH, A., SIX, H.-W., AND WIDMAYER, P. 1989. The LSD tree: spatial access to multidimensional and non-point objects. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB '89, Amsterdam, The Netherlands, Aug 22-25), R. P. van de Riet, Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 45-53. Google ScholarDigital Library
- HJALTASON, G. R. AND SAMET, H. 1995. Ranking in spatial databases. In Proceedings of the Fourth International Symposium on Advances in Spatial Databases (SSD'95, Portland, ME, Aug.), M. J. Egenhofer and J. R. Herring, Eds. Springer-Verlag, New York, 83-95. Google ScholarDigital Library
- HOEL, E. G. AND SAMET, H. 1991. Efficient processing of spatial queries in line segment databases. In Proceedings of the 2nd symposium on Advances in Spatial Databases (SSD '91, Zurich, Switzerland, Aug. 28-30, 1991), O. G nther and H.-J. Schek, Eds. Springer Lecture Notes in Computer Science, vol. 525. Springer-Verlag, New York, NY, 237-256. Google ScholarDigital Library
- KAMEL, I. AND FALOUTSOS, C. 1993. On packing R-trees. In Proceedings of the Second International Conference on Information and Knowledge Management (CIKM '93, Washington, DC, Nov. 1-5 1993), B. Bhargava, T. Finin, and Y. Yesha, Eds. ACM Press, New York, NY, 490-499. Google ScholarDigital Library
- KAMEL, I. AND FALOUTSOS, C. 1994. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94, Santiago, Chile, Sept.). VLDB Endowment, Berkeley, CA, 500-509. Google ScholarDigital Library
- KAMGAR-PARSI, B. AND KANAL, L. N. 1985. An improved branch and bound algorithm for computing k -nearest neighbors. Pattern Recogn. Lett. 3, 1 (Jan.).Google ScholarDigital Library
- KANTH, K. V. R., AGRAWAL, D., AND SINGH, A. 1998. Dimensionality reduction for similarity searching in dynamic databases. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD '98, Seattle, WA, June 1-4, 1998), L. Haas, P. Drew, A. Tiwary, and M. Franklin, Eds. ACM Press, New York, NY, 237-248. Google ScholarDigital Library
- KATAYAMA, N. AND SATOH, S. 1997. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In Proceedings of the International ACM Conference on Management of Data (SIGMOD '97, May). ACM, New York, NY, 369-380. Google ScholarDigital Library
- KORN, F., SIDIROPOULOS, N., FALOUTSOS, C., SIEGEL, E., AND PROTOPAPAS, Z. 1996. Fast nearest neighbor search in medical image databases. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96, Mumbai, India, Sept.). 215-226. Google ScholarDigital Library
- K_RIEGEL, H.-P., SCHMIDT, T., AND SEIDL, T. 1997. 3-D similarity search by shape approximation. In Proceedings of the Fifth International Symposium on Advances in Spatial Databases (SSD'97, Berlin, July), M. Scholl and A. Voisard, Eds. Springer-Verlag, New York, 11-28. Google ScholarDigital Library
- LINDENBAUM, M. AND SAMET, H. 1995. A probabilistic analysis of trie-based sorting of large collections of line segments. TR-3455. University of Maryland at College Park, College Park, MD. Google ScholarDigital Library
- LOMET, D. AND SALZBERG, B. 1989. A robust multi-attribute search structure. In Proceedings of the Fifth IEEE International Conference on Data Engineering (Los Angeles, CA, Feb. 1989). 296-304. Google ScholarDigital Library
- MURALIKRISHNA, M. AND DEWITT, D. J. 1988. Equi-depth multidimensional histograms. In Proceedings of the Conference on Management of Data (SIGMOD '88, Chicago, IL, June 1-3, 1988), H. Boral and P.-A. Larson, Eds. ACM Press, New York, NY, 28-36. Google ScholarDigital Library
- MURPHY, O J AND SELKOW, S M 1986. The efficiency of using k-d trees for finding nearest neighbors in discrete space. Inf. Process. Lett. 23, 4 (Nov. 8, 1986), 215-218. Google ScholarDigital Library
- NELSON, R. C AND SAMET, H. 1986. A consistent hierarchical representation for vector data. SIGGRAPH Comput. Graph. 20, 4 (Aug. 1986), 197-206. Google ScholarDigital Library
- BUREAU OF THE CENSUS, 1989. Tiger/Line precensus files. Bureau of the Census, Washington DC.Google Scholar
- ROBINSON, J. T. 1981. The k-d-b-tree: A search structure for large multidimensional dynamic indexes. In Proceedings of the ACM SIGMOD 1981 International Conference on Management of Data (Ann Arbor, MI, Apr. 29-May 1). ACM Press, New York, NY, 10-18. Google ScholarDigital Library
- ROUSSOPOULOS, N., KELLEY, S., AND VINCENT, F. 1995. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD '95, San Jose, CA, May 23-25), M. Carey and D. Schneider, Eds. ACM Press, New York, NY, 71-79. Google ScholarDigital Library
- ROUSSOPOULOS, N. AND LEIFKER, D. 1985. Direct spatial search on pictorial databases using packed R-trees. In Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD, Austin, TX, May). ACM Press, New York, NY, 17-31. Google ScholarDigital Library
- SAMET, H. 1990. The Design and Analysis of Spatial Data Structures. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA. Google ScholarDigital Library
- SEIDL, T. AND KRIEGEL, H.-P. 1998. Optimal multi-step k-nearest neighbor search. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD '98, Seattle, WA, June 1-4, 1998), L. Haas, P. Drew, A. Tiwary, and M. Franklin, Eds. ACM Press, New York, NY, 154-165. Google ScholarDigital Library
- SELINGER, P. G., ASTRAHAN, M. M., LORIE, R. A., AND PRICE, T. G. 1979. Access path selection in a relational database management system. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD '79, Boston, MA, May 30-June 1). ACM Press, New York, NY, 23-34. Google ScholarDigital Library
- SELLIS, T., ROUSSOPOULOS, N., AND FALOUTSOS, C. 1987. The R+-tree: A dynamic index for multi-dimensional objects. In Proceedings of the 13th International Conference on Very Large Data Bases (Brighton, UK, Sept.). 71-79. Google ScholarDigital Library
- SPROULL, R. F. 1991. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica 6, 4, 579-589.Google ScholarDigital Library
- UHLMANN, J. K. 1991. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40, 4 (Nov.), 175-179.Google ScholarCross Ref
- WANG, T. L. AND SHASHA, D. 1990. Query processing for distance metrics. In Proceedings of the 16th VLDB Conference on Very Large Data Bases (VLDB, Brisbane, Australia). VLDB Endowment, Berkeley, CA, 602-613. Google ScholarDigital Library
- WHITE, D. A. AND JAIN, R. 1996. Algorithms and strategies for similarity retrieval. Tech. Rep. VCL-96-101. University of California at San Diego, La Jolla, CA.Google Scholar
- WHITE, D. A. AND JAIN, R. 1996. Similarity indexing with the SS-tree. In Proceedings of the 12th IEEE International Conference on Data Engineering (New Orleans, LA). IEEE Press, Piscataway, NJ, 516-523. Google ScholarDigital Library
Index Terms
- Distance browsing in spatial databases
Recommendations
A performance comparison of distance-based query algorithms using R-trees in spatial databases
Efficient processing of distance-based queries (DBQs) is of great importance in spatial databases due to the wide area of applications that may address such queries. The most representative and known DBQs are the K Nearest Neighbors Query (KNNQ), @r ...
Distance browsing in distributed multimedia databases
The state of the art of searching for non-text data (e.g., images) is to use extracted metadata annotations or text, which might be available as a related information. However, supporting real content-based audiovisual search, based on similarity search ...
Multi-Way Distance Join Queries in Spatial Databases
Let a tuple of n objects obeying a query graph (QG) be called the n-tuple. The “D_distance-value” of this n-tuple is the value of a linear function of distances of the n objects that make up this n-tuple, according to the edges of the QG. This paper ...
Comments