Abstract
Most database management systems cache pages from storage in a main memory buffer pool. To do this, they either rely on a hash table that translates page identifiers into pointers, or on pointer swizzling which avoids this translation. In this work, we propose vmcache, a buffer manager design that instead uses hardware-supported virtual memory to translate page identifiers to virtual memory addresses. In contrast to existing mmap-based approaches, the DBMS retains control over page faulting and eviction. Our design is portable across modern operating systems, supports arbitrary graph data, enables variable-sized pages, and is easy to implement. One downside of relying on virtual memory is that with fast storage devices the existing operating system primitives for manipulating the page table can become a performance bottleneck. As a second contribution, we therefore propose exmap, which implements scalable page table manipulation on Linux. Together, vmcache and exmap provide flexible, efficient, and scalable buffer management on multi-core CPUs and fast storage devices.
Supplemental Material
- 2022. LeanStore - A High-Performance Storage Engine for Modern Hardware. https://leanstore.io/.Google Scholar
- 2022. Lightning Memory-Mapped Database Manager (LMDB). http://www.lmdb.tech/doc/.Google Scholar
- 2022. MxKernel - A Bare-Metal Runtime System for Database Operations on Heterogeneous Many-Core Hardware. https://ess.cs.uos.de/research/projects/MxKernel/.Google Scholar
- 2022. Samsung PCIe Gen 4-enabled PM1733 SSD. https://semiconductor.samsung.com/ssd/enterprise-ssd/pm1733-pm1735/mzwlj3t8hbls-00007/.Google Scholar
- 2022. WiredTiger Storage Engine. https://docs.mongodb.com/manual/core/wiredtiger/.Google Scholar
- Nadav Amit. 2017. Optimizing the TLB Shootdown Algorithm with Page Access Tracking. In USENIX ATC. 27--39.Google Scholar
- Nadav Amit, Amy Tai, and Michael Wei. 2020. Don't shoot down TLB shootdowns!. In EuroSys. 1--14.Google Scholar
- Lawrence Benson, Hendrik Makait, and Tilmann Rabl. 2021. Viper: An Efficient Hybrid PMem-DRAM Key-Value Store. PVLDB 14, 9 (2021), 1544--1556.Google ScholarDigital Library
- Jan Böttcher, Viktor Leis, Jana Giceva, Thomas Neumann, and Alfons Kemper. 2020. Scalable and robust latches for database systems. In DaMoN.Google Scholar
- Jungsik Choi, Jiwon Kim, and Hwansoo Han. 2017. Efficient Memory Mapped File I/O for In-Memory File Systems. In HotStorage.Google Scholar
- Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2012. Scalable Address Spaces Using RCU Balanced Trees. In ASPLOS. 199--210.Google Scholar
- Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2013. RadixVM: Scalable Address Spaces for Multi-threaded Applications. In EuroSys. 211--224.Google Scholar
- Andrew Crotty, Viktor Leis, and Andrew Pavlo. 2022. Are You Sure You Want to Use MMAP in Your Database Management System?. In CIDR.Google Scholar
- Dominik Durner, Viktor Leis, and Thomas Neumann. 2019. Experimental Study of Memory Allocation for High-Performance Query Processing. In ADMS. 1--9.Google Scholar
- Wolfgang Effelsberg and Theo Härder. 1984. Principles of Database Buffer Management. ACM Trans. Database Syst. 9, 4 (1984).Google ScholarDigital Library
- Goetz Graefe, Haris Volos, Hideaki Kimura, Harumi A. Kuno, Joseph Tucek, Mark Lillibridge, and Alistair C. Veitch. 2014. In-Memory Performance for Big Data. PVLDB 8, 1 (2014), 37--48.Google ScholarDigital Library
- Brendan Gregg. 2016. The flame graph. Commun. ACM 59, 6 (2016), 48--57.Google ScholarDigital Library
- Gabriel Haas, Michael Haubenschild, and Viktor Leis. 2020. Exploiting Directly-Attached NVMe Arrays in DBMS. In CIDR.Google Scholar
- Michael Haubenschild, Caetano Sauer, Thomas Neumann, and Viktor Leis. 2020. Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines. In SIGMOD. 877--892.Google Scholar
- Alfons Kemper and Donald Kossmann. 1995. Adaptable Pointer Swizzling Strategies in Object Bases: Design, Realization, and Quantitative Analysis. VLDB Journal 4, 3 (1995), 519--566.Google ScholarCross Ref
- Hideaki Kimura. 2015. FOEDUS: OLTP Engine for a Thousand Cores and NVRAM. In SIGMOD. 691--706.Google Scholar
- Mohan Kumar, Steffen Maass, Sanidhya Kashyap, Ján Veselý, Zi Yan, Taesoo Kim, Abhishek Bhattacharjee, and Tushar Krishna. 2018. LATR: Lazy Translation Coherence. In ASPLOS. 651--664.Google ScholarDigital Library
- Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE. 185--196.Google Scholar
- Viktor Leis, Michael Haubenschild, and Thomas Neumann. 2019. Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method. IEEE Data Eng. Bull. 42, 1 (2019), 73--84.Google Scholar
- Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of practical synchronization. In DaMoN.Google Scholar
- Dean De Leo and Peter A. Boncz. 2019. Packed Memory Arrays - Rewired. In ICDE. 830--841.Google Scholar
- Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. In ICDE. 302--313.Google Scholar
- Gang Liu, Leying Chen, and Shimin Chen. 2021. Zen: a High-Throughput Log-Free OLTP Engine for Non-Volatile Main Memory. PVLDB 14, 5 (2021), 835--848.Google ScholarDigital Library
- David B. Lomet. 2019. Data Caching Systems Win the Cost/Performance Game. IEEE Data Eng. Bull. 42, 1 (2019), 3--5.Google Scholar
- Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache craftiness for fast multicore key-value storage. In EuroSys. 183--196.Google Scholar
- Maged M. Michael. 2004. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Trans. Parallel Distributed Syst. 15, 6 (2004), 491--504.Google ScholarDigital Library
- Jan Mühlig and Jens Teubner. 2021. MxTasks: How to Make Efficient Synchronization and Prefetching Easy. In SIGMOD. 1331--1344.Google Scholar
- Thomas Neumann and Michael Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.Google Scholar
- Anastasios Papagiannis, Giorgos Xanthakis, Giorgos Saloustros, Manolis Marazakis, and Angelos Bilas. 2020. Optimizing Memory-mapped I/O for Fast Storage Devices. In USENIX ATC. 813--827.Google Scholar
- Ivy Peng, Marty McFadden, Eric Green, Keita Iwabuchi, Kai Wu, Dong Li, Roger Pearce, and Maya Gokhale. 2019. UMap: Enabling application-driven optimizations for page management. In IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). 71--78.Google ScholarCross Ref
- Filip Pizlo. 2016. Locking in WebKit. https://webkit.org/blog/6161/locking-in-webkit/.Google Scholar
- Felix Martin Schuhknecht, Jens Dittrich, and Ankur Sharma. 2016. RUMA has it: Rewired User-space Memory Access is Possible! PVLDB 9, 10 (2016), 768--779.Google ScholarDigital Library
- Athinagoras Skiadopoulos, Qian Li, Peter Kraft, Kostis Kaffes, Daniel Hong, Shana Mathew, David Bestor, Michael J. Cafarella, Vijay Gadepally, Goetz Graefe, Jeremy Kepner, Christos Kozyrakis, Tim Kraska, Michael Stonebraker, Lalith Suresh, and Matei Zaharia. 2021. DBOS: A DBMS-oriented Operating System. PVLDB 15, 1 (2021), 21--30.Google Scholar
- Nae Young Song, Yongseok Son, Hyuck Han, and Heon Young Yeom. 2016. Efficient memory-mapped I/O on fast storage device. ACM Transactions on Storage (TOS) 12, 4 (2016), 1--27.Google ScholarDigital Library
- Michael Stonebraker. 1981. Operating System Support for Database Management. Commun. ACM 24, 7 (1981), 412--418.Google ScholarDigital Library
- Alexander van Renen, Viktor Leis, Alfons Kemper, Thomas Neumann, Takushi Hashida, Kazuichi Oe, Yoshiyasu Doi, Lilian Harada, and Mitsuru Sato. 2018. Managing Non-Volatile Memory in Database Systems. In SIGMOD. 1541--1555.Google Scholar
- Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G. Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. In SIGMOD. 473--488.Google Scholar
- Xinjing Zhou, Joy Arulraj, Andrew Pavlo, and David Cohen. 2021. Spitfire: A Three-Tier Buffer Manager for Volatile and Non-Volatile Memory. In SIGMOD. 2195--2207.Google Scholar
Index Terms
- Virtual-Memory Assisted Buffer Management
Recommendations
File-Based Memory Management for Non-volatile Main Memory
COMPSAC '13: Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications ConferenceActive research and development efforts on byte addressable non-volatile (NV) memory technologies, such as STT-RAM, PCM, and ReRAM, have been conducted in recent years. Because they are byte addressable, they can be used as main memory by directly ...
Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems
RTAS '11: Proceedings of the 2011 17th IEEE Real-Time and Embedded Technology and Applications SymposiumFlash memory is becoming the storage media of choice for mobile devices and embedded systems. The performance of flash memory is impacted by the asymmetric speed of read and write operations, limited number of erase times and the absence of in-place ...
Locality and Duplication-Aware Garbage Collection for Flash Memory-Based Virtual Memory Systems
CIT '10: Proceedings of the 2010 10th IEEE International Conference on Computer and Information TechnologyAs embedded systems adopt monolithic kernels, NAND flash memory is used for swap space of virtual memory systems. While flash memory has the advantages of low-power consumption, shock-resistance and non-volatility, it requires garbage collections due to ...
Comments