Abstract
Foster B-trees are a new variant of B-trees that combines advantages of prior B-tree variants optimized for many-core processors and modern memory hierarchies with flash storage and nonvolatile memory. Specific goals include: (i) minimal concurrency control requirements for the data structure, (ii) efficient migration of nodes to new storage locations, and (iii) support for continuous and comprehensive self-testing. Like Blink-trees, Foster B-trees optimize latching without imposing restrictions or specific designs on transactional locking, for example, key range locking. Like write-optimized B-trees, and unlike Blink-trees, Foster B-trees enable large writes on RAID and flash devices as well as wear leveling and efficient defragmentation. Finally, they support continuous and inexpensive yet comprehensive verification of all invariants, including all cross-node invariants of the B-tree structure. An implementation and a performance evaluation show that the Foster B-tree supports high concurrency and high update rates without compromising consistency, correctness, or read performance.
- Bayer, R. and McCreight, E. 1970. Organization and maintenance of large ordered indices. In Proceedings of the ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control. 107--141. Google ScholarDigital Library
- Bayer, R. and Schkolnick, M. 1977. Concurrency of operations on B-trees. Acta Inf. 9, 1--21.Google ScholarDigital Library
- Bayer, R. and Unterauer, K. 1977. Prefix B-trees. ACM Trans. Databaseyst. 2, 11--26. Google ScholarDigital Library
- Bernstein, P. A. and Newcomer, E. 2009. Transaction Processing, 2nd ed. Morgan Kaufmann.Google Scholar
- Borisov, N., Babu, S., Mandagere, N., and Uttamchandani, S. 2011. Dealing proactively with data corruption: Challenges and opportunities. In Proceedings of the International Conference on Data Engineering (ICDE'11). 34--39. Google ScholarDigital Library
- Carey, M. J., DeWitt, D. J., Franklin, M. J., Hall, N. E., McAuliffe, M. L., Naughton, J. F., Schuh, D. T., Solomon, M. H., Tan, C. K., Tsatalos, O. G., White, S. J., and Zwilling, M. J. 1994. Shoring up persistent applications. In Proceedings of the ACM SIGMOD Conference on Management of Data. 383--394. Google ScholarDigital Library
- Chamberlin, D. D., Astrahan, M. M., Blasgen, M. W., Gray, J. III, W. F. K., Lindsay, B. G., Lorie, R. A., Mehl, J. W., Price, T. G., Putzolu, G. R., Selinger, P. G., Schkolnick, M., Slutz, D. R., Traiger, I. L., Wade, B. W., and Yost, R. A. 1981. A history and evaluation of System R. Comm. ACM 24, 10, 632--646. Google ScholarDigital Library
- Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID: High-Performance, reliable secondary storage. ACM Comput. Surv. 26, 145--185. Google ScholarDigital Library
- Comer, D. 1979. The ubiquitous B-tree. ACM Comput. Surv. 11, 2, 121--137. Google ScholarDigital Library
- Eswaran, K. P., Gray, J., Lorie, R. A., and Traiger, I. L. 1976. The notions of consistency and predicate locks in a database system. Comm. ACM 19, 11, 624--633. Google ScholarDigital Library
- Graefe, G. 2004. Write-Optimized B-trees. In Proceedings of the International Conference on Very Large Databases (VLDB'04). 672--683. Google ScholarDigital Library
- Graefe, G. 2010. A survey of B-tree locking techniques. ACM Trans. Database Syst. 35, 3. Google ScholarDigital Library
- Graefe, G. 2011. Modern B-tree techniques. In Foundations and Trends in Databases. Google ScholarDigital Library
- Graefe, G. 2012. A survey of B-tree logging and recovery techniques. ACM Trans. Database Syst. 27. Google ScholarDigital Library
- Graefe, G. and Kuno, H. 2012. Definition, detection, and recovery of single-page failures, a fourth class of database failures. Proc. VLDB 5, 7, 646--655. Google ScholarDigital Library
- Graefe, G., Kuno, H., and Seeger, B. 2012. Self-Diagnosing and self-healing indexes. In Proceedings of the Workshop on Testing Database Systems (DBTest'12). Google ScholarDigital Library
- Graefe, G. and Stonecipher, R. 2009. Efficient verification of B-tree integrity. In Proceedings of the German Database Conference Datenbanksysteme für Büro, Technik und Wissenschaft (BTW'09). 27--46.Google Scholar
- Gray, J. and Reuter, A. 1992. Transaction Processing: Concepts and Techniques. Morgan Kaufmann. Google ScholarDigital Library
- Jaluta, I. and Majumda, D. 2006. Efficient space management for B-tree structure-modification operations. In Proceedings of the International Conference on Telecommunication Technology and Applications (ICTTA'06). Vol. 2. 2909--2912.Google Scholar
- Jaluta, I., Sippu, S., and Soisalon-Soininen, E. 2005. Concurrency control and recovery for balanced B-link trees. VLDB J. 14, 257--277. Google ScholarDigital Library
- Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., and Falsafi, B. 2009. Shore-MT: A scalable storage manager for the multicore era. In Proceedings of the International Conference on Extending Database Technology (EDBT'09). 24--35. Google ScholarDigital Library
- Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., and Ailamaki, A. 2010. Aether: A scalable approach to logging. Proc. VLDB 3, 1, 681--692. Google ScholarDigital Library
- Küspert, K. 1985. Fehlererkennung und fehlerbehandlung in speicherungsstrukturen von datenbanksystemen. Informatik Fachberichte 99.Google Scholar
- Lanin, V. and Shasha, D. 1986. A symmetric concurrent B-tree algorithm. In Proceedings of the ACM Fall Joint Computer Conference. IEEE Computer Society Press, Los Alamitos, CA, 380--389. Google ScholarDigital Library
- Lehman, P. L. and Yao, S. B. 1981. Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst. 6, 650--670. Google ScholarDigital Library
- Lomet, D. B. 1993. Key range locking strategies for improved concurrency. In Proceedings of the International Conference on Very Large Databases (VLDB'93). 655--664. Google ScholarDigital Library
- Lomet, D. B. 2001. The evolution of effective B-tree page organization and techniques: A personal account. SIGMOD Rec. 30, 3, 64--69. Google ScholarDigital Library
- Lomet, D. B. 2004. Simple, robust and highly concurrent B-trees with node deletion. In Proceedings of the International Conference on Data Engineering (ICDE'04). 18--27. Google ScholarDigital Library
- Lomet, D. B. and Salzberg, B. 1997. Concurrency and recovery for index trees. VLDB J. 6, 3, 224--240. Google ScholarDigital Library
- Lomet, D. B. and Tuttle, M. R. 1995. Redo recovery after system crashes. In Proceedings of the International Conference on Very Large Databases (VLDB'95). 457--468. Google ScholarDigital Library
- Lomet, D. B. and Tuttle, M. R. 2003. A theory of redo recovery. In Proceedings of the ACM SIGMOD Conference on Management of Data. 397--406. Google ScholarDigital Library
- Mohan, C. 1990. ARIES/KVL: A key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In Proceedings of the International Conference on Very Large Databases (VLDB'90). 392--405. Google ScholarDigital Library
- Mohan, C. 1995. Disk read-write optimizations and data integrity in transaction systems using write-ahead logging. In Proceedings of the International Conference on Data Engineering (ICDE'95). 324--331. Google ScholarDigital Library
- Mohan, C., Haderle, D. J., Lindsay, B. G., Pirahesh, H., and Schwarz, P. M. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17, 1, 94--162. Google ScholarDigital Library
- Mohan, C. and Levine, F. E. 1992. ARIES/IM: An efficient and high concurrency index management method using write-ahead logging. In Proceedings of the ACM SIGMOD Conference on Management of Data. 371--380. Google ScholarDigital Library
- Pandis, I., Johnson, R., Hardavellas, N., and Ailamaki, A. 2010. Data-Oriented transaction execution. Proc. VLDB 3, 1, 928--939. Google ScholarDigital Library
- Pandis, I., Tozun, P., Johnson, R., and Ailamaki, A. 2011. PLP: Page latch-free shared-everything OLTP. Proc. VLDB 4, 10, 610--621. Google ScholarDigital Library
- Sagiv, Y. 1985. Concurrent operations on B-trees with overtaking. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'85). ACM, New York, 28--37. Google ScholarDigital Library
- Sewall, J., Chhugani, J., Kim, C., Satish, N., and Dubey, P. 2011. PALM: Parallel architecture-friendly latch-free modifications to B+ trees on many-core processors. Proc. VLDB 4, 11.Google Scholar
- Weikum, G. 1991. Principles and realization strategies of multilevel transaction management. ACM Trans. Database Syst. 16, 1, 132--180. Google ScholarDigital Library
Index Terms
- Foster b-trees
Recommendations
OptiQL: Robust Optimistic Locking for Memory-Optimized Indexes
PACMMODModern memory-optimized indexes often use optimistic locks for concurrent accesses. Read operations can proceed optimistically without taking the lock, greatly improving performance on multicore CPUs. But this is at the cost of robustness against ...
ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging
DB2TM, IMS, and TandemTM systems. ARIES is applicable not only to database management systems but also to persistent object-oriented languages, recoverable file systems and transaction-based operating systems. ARIES has been implemented, to varying ...
Design and Implementation of the Concurrency Control Manager in the Main-Memory DBMS Tachyon
COMPSAC '02: Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and RedevelopmentIn this paper, we discuss the design and implementation of a concurrency control manager for the Tachyon, a main-memory DBMS. Since a main-memory DBMS, unlike a disk-resident DBMS, performs all of the data updates or data retrievals by accessing main-...
Comments