skip to main content
research-article

Foster b-trees

Published:06 September 2012Publication History
Skip Abstract Section

Abstract

Foster B-trees are a new variant of B-trees that combines advantages of prior B-tree variants optimized for many-core processors and modern memory hierarchies with flash storage and nonvolatile memory. Specific goals include: (i) minimal concurrency control requirements for the data structure, (ii) efficient migration of nodes to new storage locations, and (iii) support for continuous and comprehensive self-testing. Like Blink-trees, Foster B-trees optimize latching without imposing restrictions or specific designs on transactional locking, for example, key range locking. Like write-optimized B-trees, and unlike Blink-trees, Foster B-trees enable large writes on RAID and flash devices as well as wear leveling and efficient defragmentation. Finally, they support continuous and inexpensive yet comprehensive verification of all invariants, including all cross-node invariants of the B-tree structure. An implementation and a performance evaluation show that the Foster B-tree supports high concurrency and high update rates without compromising consistency, correctness, or read performance.

References

  1. Bayer, R. and McCreight, E. 1970. Organization and maintenance of large ordered indices. In Proceedings of the ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control. 107--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bayer, R. and Schkolnick, M. 1977. Concurrency of operations on B-trees. Acta Inf. 9, 1--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bayer, R. and Unterauer, K. 1977. Prefix B-trees. ACM Trans. Databaseyst. 2, 11--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bernstein, P. A. and Newcomer, E. 2009. Transaction Processing, 2nd ed. Morgan Kaufmann.Google ScholarGoogle Scholar
  5. Borisov, N., Babu, S., Mandagere, N., and Uttamchandani, S. 2011. Dealing proactively with data corruption: Challenges and opportunities. In Proceedings of the International Conference on Data Engineering (ICDE'11). 34--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Carey, M. J., DeWitt, D. J., Franklin, M. J., Hall, N. E., McAuliffe, M. L., Naughton, J. F., Schuh, D. T., Solomon, M. H., Tan, C. K., Tsatalos, O. G., White, S. J., and Zwilling, M. J. 1994. Shoring up persistent applications. In Proceedings of the ACM SIGMOD Conference on Management of Data. 383--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chamberlin, D. D., Astrahan, M. M., Blasgen, M. W., Gray, J. III, W. F. K., Lindsay, B. G., Lorie, R. A., Mehl, J. W., Price, T. G., Putzolu, G. R., Selinger, P. G., Schkolnick, M., Slutz, D. R., Traiger, I. L., Wade, B. W., and Yost, R. A. 1981. A history and evaluation of System R. Comm. ACM 24, 10, 632--646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID: High-Performance, reliable secondary storage. ACM Comput. Surv. 26, 145--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Comer, D. 1979. The ubiquitous B-tree. ACM Comput. Surv. 11, 2, 121--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eswaran, K. P., Gray, J., Lorie, R. A., and Traiger, I. L. 1976. The notions of consistency and predicate locks in a database system. Comm. ACM 19, 11, 624--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Graefe, G. 2004. Write-Optimized B-trees. In Proceedings of the International Conference on Very Large Databases (VLDB'04). 672--683. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Graefe, G. 2010. A survey of B-tree locking techniques. ACM Trans. Database Syst. 35, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Graefe, G. 2011. Modern B-tree techniques. In Foundations and Trends in Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Graefe, G. 2012. A survey of B-tree logging and recovery techniques. ACM Trans. Database Syst. 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Graefe, G. and Kuno, H. 2012. Definition, detection, and recovery of single-page failures, a fourth class of database failures. Proc. VLDB 5, 7, 646--655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Graefe, G., Kuno, H., and Seeger, B. 2012. Self-Diagnosing and self-healing indexes. In Proceedings of the Workshop on Testing Database Systems (DBTest'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Graefe, G. and Stonecipher, R. 2009. Efficient verification of B-tree integrity. In Proceedings of the German Database Conference Datenbanksysteme für Büro, Technik und Wissenschaft (BTW'09). 27--46.Google ScholarGoogle Scholar
  18. Gray, J. and Reuter, A. 1992. Transaction Processing: Concepts and Techniques. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jaluta, I. and Majumda, D. 2006. Efficient space management for B-tree structure-modification operations. In Proceedings of the International Conference on Telecommunication Technology and Applications (ICTTA'06). Vol. 2. 2909--2912.Google ScholarGoogle Scholar
  20. Jaluta, I., Sippu, S., and Soisalon-Soininen, E. 2005. Concurrency control and recovery for balanced B-link trees. VLDB J. 14, 257--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., and Falsafi, B. 2009. Shore-MT: A scalable storage manager for the multicore era. In Proceedings of the International Conference on Extending Database Technology (EDBT'09). 24--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., and Ailamaki, A. 2010. Aether: A scalable approach to logging. Proc. VLDB 3, 1, 681--692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Küspert, K. 1985. Fehlererkennung und fehlerbehandlung in speicherungsstrukturen von datenbanksystemen. Informatik Fachberichte 99.Google ScholarGoogle Scholar
  24. Lanin, V. and Shasha, D. 1986. A symmetric concurrent B-tree algorithm. In Proceedings of the ACM Fall Joint Computer Conference. IEEE Computer Society Press, Los Alamitos, CA, 380--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lehman, P. L. and Yao, S. B. 1981. Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst. 6, 650--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lomet, D. B. 1993. Key range locking strategies for improved concurrency. In Proceedings of the International Conference on Very Large Databases (VLDB'93). 655--664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lomet, D. B. 2001. The evolution of effective B-tree page organization and techniques: A personal account. SIGMOD Rec. 30, 3, 64--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lomet, D. B. 2004. Simple, robust and highly concurrent B-trees with node deletion. In Proceedings of the International Conference on Data Engineering (ICDE'04). 18--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lomet, D. B. and Salzberg, B. 1997. Concurrency and recovery for index trees. VLDB J. 6, 3, 224--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lomet, D. B. and Tuttle, M. R. 1995. Redo recovery after system crashes. In Proceedings of the International Conference on Very Large Databases (VLDB'95). 457--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lomet, D. B. and Tuttle, M. R. 2003. A theory of redo recovery. In Proceedings of the ACM SIGMOD Conference on Management of Data. 397--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mohan, C. 1990. ARIES/KVL: A key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In Proceedings of the International Conference on Very Large Databases (VLDB'90). 392--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mohan, C. 1995. Disk read-write optimizations and data integrity in transaction systems using write-ahead logging. In Proceedings of the International Conference on Data Engineering (ICDE'95). 324--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mohan, C., Haderle, D. J., Lindsay, B. G., Pirahesh, H., and Schwarz, P. M. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17, 1, 94--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mohan, C. and Levine, F. E. 1992. ARIES/IM: An efficient and high concurrency index management method using write-ahead logging. In Proceedings of the ACM SIGMOD Conference on Management of Data. 371--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pandis, I., Johnson, R., Hardavellas, N., and Ailamaki, A. 2010. Data-Oriented transaction execution. Proc. VLDB 3, 1, 928--939. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Pandis, I., Tozun, P., Johnson, R., and Ailamaki, A. 2011. PLP: Page latch-free shared-everything OLTP. Proc. VLDB 4, 10, 610--621. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sagiv, Y. 1985. Concurrent operations on B-trees with overtaking. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'85). ACM, New York, 28--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sewall, J., Chhugani, J., Kim, C., Satish, N., and Dubey, P. 2011. PALM: Parallel architecture-friendly latch-free modifications to B+ trees on many-core processors. Proc. VLDB 4, 11.Google ScholarGoogle Scholar
  40. Weikum, G. 1991. Principles and realization strategies of multilevel transaction management. ACM Trans. Database Syst. 16, 1, 132--180. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Foster b-trees

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 37, Issue 3
      August 2012
      191 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/2338626
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 September 2012
      • Accepted: 1 May 2012
      • Received: 1 March 2012
      Published in tods Volume 37, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader