Abstract
Indexes are commonly employed to retrieve a portion of a file or to retrieve its records in a particular order. An accurate performance model of indexes is essential to the design, analysis, and tuning of file management and database systems, and particularly to database query optimization. Many previous studies have addressed the problem of estimating the number of disk page fetches when randomly accessing k records out of N given records stored on T disk pages. This paper generalizes these results, relaxing two assumptions that usually do not hold in practice: unlimited buffer and unique records for each key value. Experiments show that the performance of an index scan is very sensitive to buffer size limitations and multiple records per key value. A model for these more practical situations is presented and a formula derived for estimating the performance of an index scan. We also give a closed-form approximation that is easy to compute. The theoretical results are validated using the R* distributed relational database system. Although we use database terminology throughout the paper, the model is more generally applicable whenever random accesses are made using keys.
- 1 ASTRAHAN, M. M., SCHKOLNICK, M., AND KIM, W. Performance of the System R access path selection mechanism. Inf. Process. 80 (1980), 487-491.Google Scholar
- 2 BITTON, D., AND DEWITT, D.J. Duplicate record elimination in large data files. A CM Trans. Database Syst. 8 (June 1983), 255-265. Google ScholarDigital Library
- 3 CARDENAS, A.F. Analysis and performance of inverted data base structures. Commun. A CM 18 (May 1975), 253-263. Google ScholarDigital Library
- 4 CHEUNG, T.-Y. Estimating block accesses and number of records in file management. Commun. ACM 25 (July 1982), 484-487. Google ScholarDigital Library
- 5 CHOU, H.-T., AND DEWITT, D.J. An evaluation of buffer management strategies for relational database systems. In Proceedings of Eleventh International Conference on Very Large Data Bases (Stockholm, Sept. 1985), pp. 127-141.Google Scholar
- 6 CHRISTODOULAKIS, S. Estimating block transfers and join sizes. In Proceedings of the A CM- SIGMOD Conference (San Jose, Calif., May 1983). ACM, New York, 1983, pp. 40-54. Google ScholarDigital Library
- 7 CHRISTODOULAKIS, S. Estimating block selectivities. Inf. Syst. 1 (1984), 69-79.Google ScholarCross Ref
- 8 CHRISTODOULAKIS, S. Implications of certain assumptions in database performance evaluation. ACM Trans. Database Syst. 9 (June 1984), 163-186. Google ScholarDigital Library
- 9 COMER, D. The ubiquitous B-tree. ACM Comput. Surv. 11 (June 1979), 121-137. Google ScholarDigital Library
- 10 EFFELSBERG, W., AND HAERDER, T. Principles of database buffer management. ACM Trans. Database Syst. 9 (Dec. 1984), 560-595. Google ScholarDigital Library
- 11 IJBEMA, A., AND BLANKEN, S. Estimating bucket accesses: A practical approach. In Proceedings of the International Conference on Data Engineering (Los Angeles, Calif., Feb. 1986), pp. 30-37. Google ScholarDigital Library
- 12 LOHMAN, G. M., ET AL. Query processing in R*. In Query Processing in Database Systems, Springer-Verlag, (also available as IBM Res. Rep. RJ4272, April 1984).Google Scholar
- 13 LUK, W.S. On estimating block accesses in database organizations. Commun. ACM 11 (1983), 945-947. Google ScholarDigital Library
- 14 MACKERT, L. F., AND LOHMAN, G.M. Index scans using a finite LRU buffer: A validated I/O Model. IBM Res. Rep. RJ4836, San Jose, Calif., Sept. 1985.Google Scholar
- 15 MACKERT, L. F., AND LOHMAN, G.M. R* optimizer validation and performance evaluation for local queries. In Proceedings of ACM-SIGMOD Conference (Washington, D.C., May 1986), pp. 84-95. Google ScholarDigital Library
- 16 MACKERT, L. F., AND LOHMAN, G.M. R* optimizer validation and performance evaluation for distributed queries. In Proceedings of Twelfth International Conference on Very Large Data Bases (Kyoto, Aug. 1986), pp. 149-159. Google ScholarDigital Library
- 17 SACCO, G. M., AND SCHKOLNICK, M. A mechanism for managing the buffer pool in a relational database system using the hot set model. In Proceedings of the Eighth International Conference on Very Large Data Bases (Mexico City, 1982), pp. 257-262. Google ScholarDigital Library
- 18 SELINGER, P. G., ET AL. Access path selection in a relational database management system, in Proceedings of ACM-SIGMOD Conference (1979). ACM, New York, 1979. Google ScholarDigital Library
- 19 STONEBRAKER, M. Operating system support for database management. Commun. ACM 24 (July 1981), 412-418. Google ScholarDigital Library
- 20 VANDER ZANDEN, B. T., TAYLOR, H. M., AND BITTON, D. Estimating block accesses when attributes are correlated. In Proceedings of the Twelfth International Conference on Very Large Data Bases (Kyoto, Aug. 1986), pp. 119-127. Google ScholarDigital Library
- 21 WATERS, S.J. Hit ratios. Computer J. 19 (1976), 21-24.Google ScholarCross Ref
- 22 WHANG, K.-Y., WIEDERHOLD, G., AND SAGALOWICZ, D. Estimating block accesses in database organizations: A closed noniterative formula. Commun. Au~ zo (l~ov. 1983), 940-944. Google ScholarDigital Library
- 23 YAO, S.B. Approximating block accesses in database organizations. Commun. ACM 20 {April 1977), 260-261. Google ScholarDigital Library
Index Terms
- Index scans using a finite LRU buffer: a validated I/O model
Recommendations
Estimating page fetches for index scans with finite LRU buffers
We describe an algorithm for estimating the number of page fetches for a partial or complete scan of a B-tree index. The algorithm obtains estimates for the number of page fetches for an index scan when given the number of tuples selected and the number ...
Estimating page fetches for index scans with finite LRU buffers
We describe an algorithm for estimating the number of page fetches for a partial or complete scan of a B-tree index. The algorithm obtains estimates for the number of page fetches for an index scan when given the number of tuples selected and the number ...
Estimating page fetches for index scans with finite LRU buffers
SIGMOD '94: Proceedings of the 1994 ACM SIGMOD international conference on Management of dataWe describe an algorithm for estimating the number of page fetches for a partial or complete scan of a B-tree index. The algorithm obtains estimates for the number of page fetches for an index scan when given the number of tuples selected and the number ...
Comments