skip to main content
article

On a model of indexability and its bounds for range queries

Published:01 January 2002Publication History
Skip Abstract Section

Abstract

We develop a theoretical framework to characterize the hardness of indexing data sets on block-access memory devices like hard disks. We define an indexing workload by a data set and a set of potential queries. For a workload, we can construct an indexing scheme, which is a collection of fixed-sized subsets of the data. We identify two measures of efficiency for an indexing scheme on a workload: storage redundancy, r (how many times each item in the data set is stored), and access overhead, A (how many times more blocks than necessary does a query retrieve).For many interesting families of workloads, there exists a trade-off between storage redundancy and access overhead. Given a desired access overhead A, there is a minimum redundancy that any indexing scheme must exhibit. We prove a lower-bound theorem for deriving the minimum redundancy. By applying this theorem, we show interesting upper and lower bounds and trade-offs between A and r in the case of multidimensional range queries and set queries.

References

  1. AOKI, P. M. 1998. Generalizing "search" in generalized search trees (extended abstract). In Proceedings of the 14th International Conference on Data Engineering (Orlando, Fla., Feb. 23-27). pp. 380-389. Google ScholarGoogle Scholar
  2. ARGE, L., SAMOLADAS, V., AND VITTER, J. S. 1999. On two-dimensional indexability and optimal range search indexing. In Proceedings of the 18th Annual ACM Symposium on the Principle of Database Systems. ACM, New York, pp. 346-357. Google ScholarGoogle Scholar
  3. ARGE, L., AND VITTER, J. S. 1996. Optimal dynamic interval management in external memory. In Proceedings of the 37th Annual Symposium on Foundations of Computer Science (Oct.). IEEE Computer Society Press, Los Alamitos, Calif., pp. 560-569. Google ScholarGoogle Scholar
  4. BARKOL, O., AND RABANI, Y. 2000. Tighter bounds for nearest neighbor search and related problems in the cell probe model. In Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 388-396. Google ScholarGoogle Scholar
  5. BAYER, R., AND MCCREIGHT, E. 1972. Organization and maintenance of large ordered indexes. Acta Inf. 1, 173-189.Google ScholarGoogle Scholar
  6. BELUSSI, A., AND FALOUTSOS, C. 1995. Estimating the selectivity of spatial queries using the 'correlation' fractal dimension. In Proceedings of the 21st International Conference on Very Large Databases. pp. 299-310. Google ScholarGoogle Scholar
  7. CHAKRABARTI, A., CHAZELLE, B., GUM, B., AND LVOV, A. 1999. A lower bound on the complexity of approximate nearest-neighbor searching on the hamming cube. In Proceedings of the 31st Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 305-311. Google ScholarGoogle Scholar
  8. CHAZELLE, B. 1990a. Lower bounds for orthogonal range searching. I: The reporting case. J. ACM 37, 2 (Apr.), 200-212. Google ScholarGoogle Scholar
  9. CHAZELLE, B. 1990b. Lower bounds for orthogonal range searching. II: The arithmetic model. J. ACM 37, 3 (June), 439-463. Google ScholarGoogle Scholar
  10. CHAZELLE, B. 1995. Lower bounds for off-line range searching. In Proceedings of the 27th Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 733-740. Google ScholarGoogle Scholar
  11. FALOUTSOS, C., AND KAMEL, I. 1994. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proceedings of the 13th Annual ACM Symposium on the Principles of Database Systems. ACM, New York, pp. 4-13. Google ScholarGoogle Scholar
  12. FIAT, A., AND SHAMIR, A. 1989. How to find a battleship. Networks 19, 361-371.Google ScholarGoogle Scholar
  13. FREDMAN, M. L. 1980. The inherent complexity of dynamic data structures which accomodate range queries. In Proceedings of the IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 191-199.Google ScholarGoogle Scholar
  14. FREDMAN, M. L. 1981. Lower bounds on the complexity of some optimal data structures. SIAM J. Comput. 10, 1-10.Google ScholarGoogle Scholar
  15. GAEDE, V., AND GUNTHER, O. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 2 (June), 170-231. Google ScholarGoogle Scholar
  16. GUTTMAN, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on the Management of Data. ACM, New York, pp. 47-57. Google ScholarGoogle Scholar
  17. HELLERSTEIN, J. M., NAUGHTON, J. E., AND PFEFFER, A. 1995. Generalized search trees for database systems. In Proceedings of the 21st International Conference on Very Large Databases. pp. 562-573. Google ScholarGoogle Scholar
  18. JOHNSON, S. M. 1962. A new upper bound for error-correcting codes. IEEE Trans. Inf. Theory 8, 203- 207.Google ScholarGoogle Scholar
  19. KANELLAKIS, P. C., RAMASWAMY, S., VENGROFF, D. E., AND VITTER, J. S. 1996. Indexing for data models with constraints and classes. J. Comput. Syst. Sci. 52, 3, 589-612. Google ScholarGoogle Scholar
  20. KANELLAKIS, P. C., RAMASWAMY, S., VENGROFF, D. E., AND VITTER, J. S. 1993. Indexing for data models with constraints and classes. In Proceedings of the 12th Annual ACM Symposium on Principles of Database Systems. ACM, New York, pp. 233-243. Google ScholarGoogle Scholar
  21. KANTH, K. V. R., RAVADA, S., SHARMA, J., AND BANERJEE, J. 1999. Indexing medium-dimensionality data in Oracle. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Philadelphia, Pa., June 1-3). ACM, New York, pp. 521-522. Google ScholarGoogle Scholar
  22. KORNACKER, M. 1999. High-performance extensible indexing. In Proceedings of 25th International Conference on Very Large Data Bases (Edinburgh, Scotland, UK, Sept. 7-10). ACM, New York, pp. 699-708. Google ScholarGoogle Scholar
  23. KORNACKER, M., MOHAN, C., AND HELLERSTEIN, J. M. 1997. Concurrency and recovery in generalized search trees. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Tucson, Az., May 13-15). ACM, New York, pp. 62-72. Google ScholarGoogle Scholar
  24. KOUTSOUPIAS, E., AND TAYLOR, D. S. 1998. Tight bounds for 2-dimensional indexing schemes. In Proceedings of the 17th Annual ACM Symposium on Principles of Database Systems. ACM, New York, pp. 52-58. Google ScholarGoogle Scholar
  25. KOUTSOUPIAS, E., AND TAYLOR, D. S. 1999. Indexing schemes for random points. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 596-602. Google ScholarGoogle Scholar
  26. MATOUSEK, J. 1999. Geometric Discrepancy. Springer-Verlag, New York.Google ScholarGoogle Scholar
  27. MEHLHORN, K. 1984. Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry. EATCS Monographs on Theoretical Computer Science, Springer-Verlag, New York. Google ScholarGoogle Scholar
  28. MILTERSEN, P. B., NISAN, N., SAFRA, S., AND WIDGERSON, A. 1995. On data structures and asymmetric communication complexity. In Proceedings of the 27th Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 103-111. Google ScholarGoogle Scholar
  29. NIEVERGELT, J., HINTERBERGER, H., AND SEVCIK, K. C. 1984. The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Datab. Syst. 9, 1 (Mar.), 38-71. Google ScholarGoogle Scholar
  30. NODINE, M. H., GOODRICH, M. T., AND VITTER, J. S. 1993. Blocking for external graph searching. In Proceedings of the 12th Annual ACM Symposium on Principles of Database Systems. ACM, New York, pp. 222-232. Google ScholarGoogle Scholar
  31. PAGEL, B.-U., SIX, H.-W., TOBEN, H., AND WIDMAYER, P. 1993. Towards an analysis of range query performance in spatial data structures. In Proceedings of the 12th Annual ACMSymposium on Principles of Database Systems. ACM, New York, pp. 214-221. Google ScholarGoogle Scholar
  32. RAMASWAMY, S., AND KANELLAKIS, P. C. 1995. OODB indexing by class-division. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, pp. 139-150. Google ScholarGoogle Scholar
  33. RAMASWAMY, S., AND SUBRAMANIAN, S. 1994. Path caching: A technique for optimal external searching. In Proceedings of the 13th Annual ACM Symposium on the Principles of Database Systems. ACM, New York, pp. 25-35. Google ScholarGoogle Scholar
  34. SALZBERG, B., AND TSOTRAS, V. 1999. Comparison of access methods for time-evolving data. ACM Comput. Surv. 31, 2, 158-221. Google ScholarGoogle Scholar
  35. SAMET, H. 1989. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Mass. Google ScholarGoogle Scholar
  36. SHAH, M. A., KORNACKER, M., AND HELLERSTEIN, J. M. 1999. AMDB: A visual access method development tool. In User Interfaces to Data Intensive Systems (UIDIS). pp. 130-140. Google ScholarGoogle Scholar
  37. SMID, M., AND OVERMARS, M. 1990. Maintaining range trees in secondary memory. Part II: Lower bounds. Acta Inf. 27, 453-480. Google ScholarGoogle Scholar
  38. SUBRAMANIAN, S., AND RAMASWAMY, S. 1995. The P-range tree: A new data structure for range searching in secondary memory. In Proceedings of the 6th ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 378-387. Google ScholarGoogle Scholar
  39. TURAN, P. 1941. An extrenal problem in graph theory (in Hungarian). Mat. Fiz. Lapok. 48, 435-452.Google ScholarGoogle Scholar
  40. VAIDYA, P. M. 1989. Space-time trade-offs for orthogonal range queries. SIAM J. Comput. 18, 4 (Aug.), 748-758. Google ScholarGoogle Scholar
  41. VAN LINT, J. H., AND WILSON, R. M. 1992. A Course in Combinatorics. Cambridge University Press, Cambridge, Mass.Google ScholarGoogle Scholar
  42. VENGROFF, D. E., AND VITTER, J. S. 1996. Efficient 3-D range searching in external memory. In Proceedings of the 28th Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 192-201. Google ScholarGoogle Scholar
  43. WITTEN, I. H., MOFFAT, A., AND BELL, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan-Kaufmann, San Francisco, Calif. Google ScholarGoogle Scholar
  44. YAO, A. C. 1981. Should tables be sorted? J. ACM 28, 3, 615-628. Google ScholarGoogle Scholar
  45. YAO, A. C. 1982. Space-time tradeoff for answering range queries. In Proceedings of the 14th Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 128-136. Google ScholarGoogle Scholar

Index Terms

  1. On a model of indexability and its bounds for range queries

    Recommendations

    Reviews

    Fazli Can

    In this paper, the authors introduce a new framework for the modeling of indexing on block-access external devices such as hard disks. The authors define an indexing workload as a data set and the associated set of potential queries. For each workload, an indexing scheme is defined as a collection of fixed-size subsets of the data. The authors further define two measures of efficiency for an indexing scheme: storage redundancy (how many times each item in the data set is stored), and access overhead (how many times a query retrieves more blocks than necessary). The authors show that there are upper and lower bounds of these two measures of efficiency for multidimensional range queries and set queries, as well as tradeoffs between them. Among other findings, the results indicate that for two-dimensional range queries, a small amount of redundancy can considerably decrease the worst-case query cost. The paper starts with a discussion of the practical motivations of the study, and a review of the related work. This discussion is followed by numerous theorems and proofs. Mathematically versed researchers will appreciate the paper. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader