skip to main content
article
Free Access

Distance browsing in spatial databases

Published:01 June 1999Publication History
Skip Abstract Section

Abstract

We compare two different techniques for browsing through a collection of spatial objects stored in an R-tree spatial data structure on the basis of their distances from an arbitrary spatial query object. The conventional approach is one that makes use of a k-nearest neighbor algorithm where k is known prior to the invocation of the algorithm. Thus if m < k neighbors are needed, the k-nearest neighbor algorithm has to be reinvoked for m neighbors, thereby possibly performing some redundant computations. The second approach is incremental in the sense that having obtained the k nearest neighbors, the k + 1st neighbor can be obtained without having to calculate the k + 1 nearest neighbors from scratch. The incremental approach is useful when processing complex queries where one of the conditions involves spatial proximity (e.g., the nearest city to Chicago with population greater than a million), in which case a query engine can make use of a pipelined strategy. We present a general incremental nearest neighbor algorithm that is applicable to a large class of hierarchical spatial data structures. This algorithm is adapted to the R-tree and its performance is compared to an existing k-nearest neighbor algorithm for R-trees [Rousseopoulos et al. 1995]. Experiments show that the incremental nearest neighbor algorithm significantly outperforms the k-nearest neighbor algorithm for distance browsing queries in a spatial database that uses the R-tree as a spatial index. Moreover, the incremental nearest neighbor algorithm usually outperforms the k-nearest neighber algorithm when applied to the k-nearest neighbor problem for the R-tree, although the improvement is not nearly as large as for distance browsing queries. In fact, we prove informally that at any step in its execution the incremental nearest neighbor algorithm is optimal with respect to the spatial data structure that is employed. Furthermore, based on some simplifying assumptions, we prove that in two dimensions the number of distance computations and leaf nodes accesses made by the algorithm for finding k neighbors is O(k + k).

References

  1. AOKI, P. M. 1998. Generalizing "search" in generalized search trees. In Proceedings of the 14th International Conference on Data Engineering (Orlando, FL, Feb.). IEEE Computer Society Press, Los Alamitos, CA, 380-389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AREF, W. G. AND SAMET, H. 1992. Uniquely reporting spatial objects: Yet another operation for comparing spatial data structures. In Proceedings of the 5th Symposium on Spatial Data Handling (Charleston, SC, Aug.). 178-189.Google ScholarGoogle Scholar
  3. AREF, W. G. AND SAMET, H. 1993. Estimating selectivity factors of spatial operations. In Optimization in Databases - Proceedings of the 5th International Workshop on Foundations of Models and Languages for Data and Objects (Aigen, Austria, Sept.). 31-40.Google ScholarGoogle Scholar
  4. ARYA, S., MOUNT, D. M., NETANYAHU, N. S., SILVERMAN, R., AND WU, A. Y. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 6, 891-923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BECKER, L. AND G TING, R. H. 1992. Rule-based optimization and query processing in an extensible geometric database system. ACM Trans. Database Syst. 17, 2 (June 1992), 247-303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. BECKMANN, N., KRIEGEL, H.-P., SCHNEIDER, R., AND SEEGER, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90, Atlantic City, NJ, May 23-25, 1990), H. Garcia-Molina, Ed. ACM Press, New York, NY, 322-331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. BENTLEY, g. L. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (Sept.), 509-517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. BERCHTOLD, S., B HM, C., KEIM, D. A., AND KRIEGEL, H.-P. 1997. A cost model for nearest neighbor search in high-dimensional data space. In Proceedings of the 16th ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '97, Tucson, AZ, May 12-14, 1997), A. Mendelzon and Z.M. zsoyoglu, Eds. ACM Press, New York, NY, 78-86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. BERCHTOLD, S., KEIM, D. A., AND KRIEGEL, H.-P. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96, Mumbai, India, Sept.). 28-39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. BERN, M. 1993. Approximate closest-point queries in high dimensions. Inf. Process. Lett. 45, 2 (Feb. 26, 1993), 95-99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. BRIN, S. 1995. Near neighbor search in large metric space. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95, Zurich, Sept.). 574-584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. BRODER, A. J. 1990. Strategies for efficient incremental nearest neighbor search. Pattern Recogn. 23, 1/2 (Jan. 1990), 171-178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. BURKHARD, W. A. AND KELLER, R. 1973. Some approaches to best-match file searching. Commun. ACM 16, 4 (Apr.), 230-236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. CIACCIA, P., PATELLA, M., AND ZEZULA, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97, Athens, Greece, Aug.). 426-435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. COMER, D. 1979. The ubiquitous B-tree. ACM Comput. Surv. 11, 2 (June), 121-137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. EASTMAN, C. M. AND ZEMANKOVA, M. 1982. Partially specified nearest neighbor searches using k-d-trees. Inf. Process. Lett. 15, 2 (Sept.), 53-56.Google ScholarGoogle ScholarCross RefCross Ref
  17. ESPERAN~A, C. AND SAMET, H. 1997. Orthogonal polygons as bounding structures in filter-refine query processing strategies. In Proceedings of the Fifth International Symposium on Advances in Spatial Databases (SSD'97, Berlin, July), M. Scholl and A. Voisard, Eds. Springer-Verlag, New York, 197-220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. FALOUTSOS, C. AND LIN, K. 1995. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the ACM SIGMOD Conference on Management of Data (San Jose, CA, May). ACM Press, New York, NY, 163-174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. FRANK, A. U. AND BARRERA, R. 1989. The Fieldtree: a data structure for geographic information systems. In Proceedings of the First Symposium on Design and Implementation of Large Spatial Databases (SSD'89, Santa Barbara, CA, July), A. Buchmann, O. G nther, T. R. Smith, and Y. F. Wang, Eds. Springer-Verlag, New York, 29-44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. FREDMAN, M. L., SEDGEWICK, R., SLEATOR, D. D., AND TARJAN, R. E. 1986. The pairing heap: a new form of self-adjusting heap. Algorithmica 1, 1 (Jan. 1986), 111-129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. FRIEDMAN, J. H., BENTLEY, J. L., AND FINKEL,, R.A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 3 (Sept.), 209-226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. FUKUNAGA, K. AND NARENDRA, P. M. 1975. A branch and bound algorithm for computing. IEEE Trans. Comput. 24, 7 (July), 750-753.Google ScholarGoogle Scholar
  23. G NTHER, O. AND NOLTEMEIER, H. 1991. Spatial database indices for large extended objects. In Proceedings of the Seventh International Conference on Data Engineering (Kobe, Japan). IEEE Computer Society Press, Los Alamitos, CA, 520-526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. GUTTMAN, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD Annual Meeting on Management of Data (SIGMOD '84, Boston, MA, June18-21). ACM, New York, NY, 47-57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. HAFNER, J., SAWHNEY, H., AND EQUITZ, W. E. AL. 1995. Efficient color histogram indexing for quadratic form distance functions. IEEE Trans. Pattern Anal. Mach. Intell. 17, 7 (July), 729-736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. HENRICH, A. 1994. A distance-scan algorithm for spatial access structures. In Proceedings of the Second ACM Workshop on Geographic Information Systems (Gaithersburg, MD, Dec.). ACM Press, New York, NY, 136-143.Google ScholarGoogle Scholar
  27. HENRICH, A. 1998. The LSDh-tree: an access structure for feature vectors. In Proceedings of the 14th International Conference on Data Engineering (Orlando, FL, Feb.). IEEE Computer Society Press, Los Alamitos, CA, 362-369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. HENRICH, A., SIX, H.-W., AND WIDMAYER, P. 1989. The LSD tree: spatial access to multidimensional and non-point objects. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB '89, Amsterdam, The Netherlands, Aug 22-25), R. P. van de Riet, Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 45-53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. HJALTASON, G. R. AND SAMET, H. 1995. Ranking in spatial databases. In Proceedings of the Fourth International Symposium on Advances in Spatial Databases (SSD'95, Portland, ME, Aug.), M. J. Egenhofer and J. R. Herring, Eds. Springer-Verlag, New York, 83-95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. HOEL, E. G. AND SAMET, H. 1991. Efficient processing of spatial queries in line segment databases. In Proceedings of the 2nd symposium on Advances in Spatial Databases (SSD '91, Zurich, Switzerland, Aug. 28-30, 1991), O. G nther and H.-J. Schek, Eds. Springer Lecture Notes in Computer Science, vol. 525. Springer-Verlag, New York, NY, 237-256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. KAMEL, I. AND FALOUTSOS, C. 1993. On packing R-trees. In Proceedings of the Second International Conference on Information and Knowledge Management (CIKM '93, Washington, DC, Nov. 1-5 1993), B. Bhargava, T. Finin, and Y. Yesha, Eds. ACM Press, New York, NY, 490-499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. KAMEL, I. AND FALOUTSOS, C. 1994. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94, Santiago, Chile, Sept.). VLDB Endowment, Berkeley, CA, 500-509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. KAMGAR-PARSI, B. AND KANAL, L. N. 1985. An improved branch and bound algorithm for computing k -nearest neighbors. Pattern Recogn. Lett. 3, 1 (Jan.).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. KANTH, K. V. R., AGRAWAL, D., AND SINGH, A. 1998. Dimensionality reduction for similarity searching in dynamic databases. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD '98, Seattle, WA, June 1-4, 1998), L. Haas, P. Drew, A. Tiwary, and M. Franklin, Eds. ACM Press, New York, NY, 237-248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. KATAYAMA, N. AND SATOH, S. 1997. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In Proceedings of the International ACM Conference on Management of Data (SIGMOD '97, May). ACM, New York, NY, 369-380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. KORN, F., SIDIROPOULOS, N., FALOUTSOS, C., SIEGEL, E., AND PROTOPAPAS, Z. 1996. Fast nearest neighbor search in medical image databases. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96, Mumbai, India, Sept.). 215-226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K_RIEGEL, H.-P., SCHMIDT, T., AND SEIDL, T. 1997. 3-D similarity search by shape approximation. In Proceedings of the Fifth International Symposium on Advances in Spatial Databases (SSD'97, Berlin, July), M. Scholl and A. Voisard, Eds. Springer-Verlag, New York, 11-28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. LINDENBAUM, M. AND SAMET, H. 1995. A probabilistic analysis of trie-based sorting of large collections of line segments. TR-3455. University of Maryland at College Park, College Park, MD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. LOMET, D. AND SALZBERG, B. 1989. A robust multi-attribute search structure. In Proceedings of the Fifth IEEE International Conference on Data Engineering (Los Angeles, CA, Feb. 1989). 296-304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. MURALIKRISHNA, M. AND DEWITT, D. J. 1988. Equi-depth multidimensional histograms. In Proceedings of the Conference on Management of Data (SIGMOD '88, Chicago, IL, June 1-3, 1988), H. Boral and P.-A. Larson, Eds. ACM Press, New York, NY, 28-36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. MURPHY, O J AND SELKOW, S M 1986. The efficiency of using k-d trees for finding nearest neighbors in discrete space. Inf. Process. Lett. 23, 4 (Nov. 8, 1986), 215-218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. NELSON, R. C AND SAMET, H. 1986. A consistent hierarchical representation for vector data. SIGGRAPH Comput. Graph. 20, 4 (Aug. 1986), 197-206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. BUREAU OF THE CENSUS, 1989. Tiger/Line precensus files. Bureau of the Census, Washington DC.Google ScholarGoogle Scholar
  44. ROBINSON, J. T. 1981. The k-d-b-tree: A search structure for large multidimensional dynamic indexes. In Proceedings of the ACM SIGMOD 1981 International Conference on Management of Data (Ann Arbor, MI, Apr. 29-May 1). ACM Press, New York, NY, 10-18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. ROUSSOPOULOS, N., KELLEY, S., AND VINCENT, F. 1995. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD '95, San Jose, CA, May 23-25), M. Carey and D. Schneider, Eds. ACM Press, New York, NY, 71-79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. ROUSSOPOULOS, N. AND LEIFKER, D. 1985. Direct spatial search on pictorial databases using packed R-trees. In Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD, Austin, TX, May). ACM Press, New York, NY, 17-31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. SAMET, H. 1990. The Design and Analysis of Spatial Data Structures. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. SEIDL, T. AND KRIEGEL, H.-P. 1998. Optimal multi-step k-nearest neighbor search. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD '98, Seattle, WA, June 1-4, 1998), L. Haas, P. Drew, A. Tiwary, and M. Franklin, Eds. ACM Press, New York, NY, 154-165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. SELINGER, P. G., ASTRAHAN, M. M., LORIE, R. A., AND PRICE, T. G. 1979. Access path selection in a relational database management system. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD '79, Boston, MA, May 30-June 1). ACM Press, New York, NY, 23-34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. SELLIS, T., ROUSSOPOULOS, N., AND FALOUTSOS, C. 1987. The R+-tree: A dynamic index for multi-dimensional objects. In Proceedings of the 13th International Conference on Very Large Data Bases (Brighton, UK, Sept.). 71-79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. SPROULL, R. F. 1991. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica 6, 4, 579-589.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. UHLMANN, J. K. 1991. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40, 4 (Nov.), 175-179.Google ScholarGoogle ScholarCross RefCross Ref
  53. WANG, T. L. AND SHASHA, D. 1990. Query processing for distance metrics. In Proceedings of the 16th VLDB Conference on Very Large Data Bases (VLDB, Brisbane, Australia). VLDB Endowment, Berkeley, CA, 602-613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. WHITE, D. A. AND JAIN, R. 1996. Algorithms and strategies for similarity retrieval. Tech. Rep. VCL-96-101. University of California at San Diego, La Jolla, CA.Google ScholarGoogle Scholar
  55. WHITE, D. A. AND JAIN, R. 1996. Similarity indexing with the SS-tree. In Proceedings of the 12th IEEE International Conference on Data Engineering (New Orleans, LA). IEEE Press, Piscataway, NJ, 516-523. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distance browsing in spatial databases

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader