Abstract
We develop a theoretical framework to characterize the hardness of indexing data sets on block-access memory devices like hard disks. We define an indexing workload by a data set and a set of potential queries. For a workload, we can construct an indexing scheme, which is a collection of fixed-sized subsets of the data. We identify two measures of efficiency for an indexing scheme on a workload: storage redundancy, r (how many times each item in the data set is stored), and access overhead, A (how many times more blocks than necessary does a query retrieve).For many interesting families of workloads, there exists a trade-off between storage redundancy and access overhead. Given a desired access overhead A, there is a minimum redundancy that any indexing scheme must exhibit. We prove a lower-bound theorem for deriving the minimum redundancy. By applying this theorem, we show interesting upper and lower bounds and trade-offs between A and r in the case of multidimensional range queries and set queries.
- AOKI, P. M. 1998. Generalizing "search" in generalized search trees (extended abstract). In Proceedings of the 14th International Conference on Data Engineering (Orlando, Fla., Feb. 23-27). pp. 380-389. Google Scholar
- ARGE, L., SAMOLADAS, V., AND VITTER, J. S. 1999. On two-dimensional indexability and optimal range search indexing. In Proceedings of the 18th Annual ACM Symposium on the Principle of Database Systems. ACM, New York, pp. 346-357. Google Scholar
- ARGE, L., AND VITTER, J. S. 1996. Optimal dynamic interval management in external memory. In Proceedings of the 37th Annual Symposium on Foundations of Computer Science (Oct.). IEEE Computer Society Press, Los Alamitos, Calif., pp. 560-569. Google Scholar
- BARKOL, O., AND RABANI, Y. 2000. Tighter bounds for nearest neighbor search and related problems in the cell probe model. In Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 388-396. Google Scholar
- BAYER, R., AND MCCREIGHT, E. 1972. Organization and maintenance of large ordered indexes. Acta Inf. 1, 173-189.Google Scholar
- BELUSSI, A., AND FALOUTSOS, C. 1995. Estimating the selectivity of spatial queries using the 'correlation' fractal dimension. In Proceedings of the 21st International Conference on Very Large Databases. pp. 299-310. Google Scholar
- CHAKRABARTI, A., CHAZELLE, B., GUM, B., AND LVOV, A. 1999. A lower bound on the complexity of approximate nearest-neighbor searching on the hamming cube. In Proceedings of the 31st Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 305-311. Google Scholar
- CHAZELLE, B. 1990a. Lower bounds for orthogonal range searching. I: The reporting case. J. ACM 37, 2 (Apr.), 200-212. Google Scholar
- CHAZELLE, B. 1990b. Lower bounds for orthogonal range searching. II: The arithmetic model. J. ACM 37, 3 (June), 439-463. Google Scholar
- CHAZELLE, B. 1995. Lower bounds for off-line range searching. In Proceedings of the 27th Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 733-740. Google Scholar
- FALOUTSOS, C., AND KAMEL, I. 1994. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proceedings of the 13th Annual ACM Symposium on the Principles of Database Systems. ACM, New York, pp. 4-13. Google Scholar
- FIAT, A., AND SHAMIR, A. 1989. How to find a battleship. Networks 19, 361-371.Google Scholar
- FREDMAN, M. L. 1980. The inherent complexity of dynamic data structures which accomodate range queries. In Proceedings of the IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 191-199.Google Scholar
- FREDMAN, M. L. 1981. Lower bounds on the complexity of some optimal data structures. SIAM J. Comput. 10, 1-10.Google Scholar
- GAEDE, V., AND GUNTHER, O. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 2 (June), 170-231. Google Scholar
- GUTTMAN, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on the Management of Data. ACM, New York, pp. 47-57. Google Scholar
- HELLERSTEIN, J. M., NAUGHTON, J. E., AND PFEFFER, A. 1995. Generalized search trees for database systems. In Proceedings of the 21st International Conference on Very Large Databases. pp. 562-573. Google Scholar
- JOHNSON, S. M. 1962. A new upper bound for error-correcting codes. IEEE Trans. Inf. Theory 8, 203- 207.Google Scholar
- KANELLAKIS, P. C., RAMASWAMY, S., VENGROFF, D. E., AND VITTER, J. S. 1996. Indexing for data models with constraints and classes. J. Comput. Syst. Sci. 52, 3, 589-612. Google Scholar
- KANELLAKIS, P. C., RAMASWAMY, S., VENGROFF, D. E., AND VITTER, J. S. 1993. Indexing for data models with constraints and classes. In Proceedings of the 12th Annual ACM Symposium on Principles of Database Systems. ACM, New York, pp. 233-243. Google Scholar
- KANTH, K. V. R., RAVADA, S., SHARMA, J., AND BANERJEE, J. 1999. Indexing medium-dimensionality data in Oracle. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Philadelphia, Pa., June 1-3). ACM, New York, pp. 521-522. Google Scholar
- KORNACKER, M. 1999. High-performance extensible indexing. In Proceedings of 25th International Conference on Very Large Data Bases (Edinburgh, Scotland, UK, Sept. 7-10). ACM, New York, pp. 699-708. Google Scholar
- KORNACKER, M., MOHAN, C., AND HELLERSTEIN, J. M. 1997. Concurrency and recovery in generalized search trees. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Tucson, Az., May 13-15). ACM, New York, pp. 62-72. Google Scholar
- KOUTSOUPIAS, E., AND TAYLOR, D. S. 1998. Tight bounds for 2-dimensional indexing schemes. In Proceedings of the 17th Annual ACM Symposium on Principles of Database Systems. ACM, New York, pp. 52-58. Google Scholar
- KOUTSOUPIAS, E., AND TAYLOR, D. S. 1999. Indexing schemes for random points. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 596-602. Google Scholar
- MATOUSEK, J. 1999. Geometric Discrepancy. Springer-Verlag, New York.Google Scholar
- MEHLHORN, K. 1984. Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry. EATCS Monographs on Theoretical Computer Science, Springer-Verlag, New York. Google Scholar
- MILTERSEN, P. B., NISAN, N., SAFRA, S., AND WIDGERSON, A. 1995. On data structures and asymmetric communication complexity. In Proceedings of the 27th Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 103-111. Google Scholar
- NIEVERGELT, J., HINTERBERGER, H., AND SEVCIK, K. C. 1984. The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Datab. Syst. 9, 1 (Mar.), 38-71. Google Scholar
- NODINE, M. H., GOODRICH, M. T., AND VITTER, J. S. 1993. Blocking for external graph searching. In Proceedings of the 12th Annual ACM Symposium on Principles of Database Systems. ACM, New York, pp. 222-232. Google Scholar
- PAGEL, B.-U., SIX, H.-W., TOBEN, H., AND WIDMAYER, P. 1993. Towards an analysis of range query performance in spatial data structures. In Proceedings of the 12th Annual ACMSymposium on Principles of Database Systems. ACM, New York, pp. 214-221. Google Scholar
- RAMASWAMY, S., AND KANELLAKIS, P. C. 1995. OODB indexing by class-division. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, pp. 139-150. Google Scholar
- RAMASWAMY, S., AND SUBRAMANIAN, S. 1994. Path caching: A technique for optimal external searching. In Proceedings of the 13th Annual ACM Symposium on the Principles of Database Systems. ACM, New York, pp. 25-35. Google Scholar
- SALZBERG, B., AND TSOTRAS, V. 1999. Comparison of access methods for time-evolving data. ACM Comput. Surv. 31, 2, 158-221. Google Scholar
- SAMET, H. 1989. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Mass. Google Scholar
- SHAH, M. A., KORNACKER, M., AND HELLERSTEIN, J. M. 1999. AMDB: A visual access method development tool. In User Interfaces to Data Intensive Systems (UIDIS). pp. 130-140. Google Scholar
- SMID, M., AND OVERMARS, M. 1990. Maintaining range trees in secondary memory. Part II: Lower bounds. Acta Inf. 27, 453-480. Google Scholar
- SUBRAMANIAN, S., AND RAMASWAMY, S. 1995. The P-range tree: A new data structure for range searching in secondary memory. In Proceedings of the 6th ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 378-387. Google Scholar
- TURAN, P. 1941. An extrenal problem in graph theory (in Hungarian). Mat. Fiz. Lapok. 48, 435-452.Google Scholar
- VAIDYA, P. M. 1989. Space-time trade-offs for orthogonal range queries. SIAM J. Comput. 18, 4 (Aug.), 748-758. Google Scholar
- VAN LINT, J. H., AND WILSON, R. M. 1992. A Course in Combinatorics. Cambridge University Press, Cambridge, Mass.Google Scholar
- VENGROFF, D. E., AND VITTER, J. S. 1996. Efficient 3-D range searching in external memory. In Proceedings of the 28th Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 192-201. Google Scholar
- WITTEN, I. H., MOFFAT, A., AND BELL, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan-Kaufmann, San Francisco, Calif. Google Scholar
- YAO, A. C. 1981. Should tables be sorted? J. ACM 28, 3, 615-628. Google Scholar
- YAO, A. C. 1982. Space-time tradeoff for answering range queries. In Proceedings of the 14th Annual ACM Symposium on the Theory of Computing. ACM, New York, pp. 128-136. Google Scholar
Index Terms
- On a model of indexability and its bounds for range queries
Recommendations
Dynamic indexability and lower bounds for dynamic one-dimensional range query indexes
PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsThe B-tree is a fundamental external index structure that is widely used for answering one-dimensional range reporting queries. Given a set of N keys, a range query can be answered in O(logB NoverM + KoverB) I/Os, where B is the disk block size, K the ...
A Read-Optimized Index Structure for Distributed Log-Structured Key-Value Store
COMPSAC '15: Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 03Recently, Big Data processing is becoming a necessary technique to efficiently store, manage, and analyze massive data obtained by social media contents. NoSQL is one of databases that efficiently handle Big Data compared to the traditional database ...
Dynamic Indexability and the Optimality of B-Trees
One-dimensional range queries, as one of the most basic type of queries in databases, have been studied extensively in the literature. For large databases, the goal is to build an external index that is optimized for disk block accesses (or I/Os). The ...
Comments