ABSTRACT
The virtual-to-physical address translation overhead, a major performance bottleneck for modern workloads, can be effectively alleviated with huge pages. However, since huge pages must be mapped contiguously, OSs have not been able to use them well because of the memory fragmentation problem despite hardware support for huge pages being available for nearly two decades. This paper presents a comprehensive study of the interaction of fragmentation with huge pages in the Linux kernel. We observe that when huge pages are used, problems such as high CPU utilization and latency spikes occur because of unnecessary work (e.g., useless page migration) performed by memory management related subsystems due to the poor handling of unmovable (i.e., kernel) pages. This behavior is even more harmful in virtualized systems where unnecessary work may be performed in both guest and host OSs. We present Illuminator, an efficient memory manager that provides various subsystems, such as the page allocator, the ability to track all unmovable pages. It allows subsystems to make informed decisions and eliminate unnecessary work which in turn leads to cost-effective huge page allocations. Illuminator reduces the cost of compaction (up to 99%), improves application performance (up to 2.3x) and reduces the maximum latency of MySQL database server (by 30x). Importantly, this work shows the effectiveness of a simple solution for long-standing huge page related problems.
- A basic model to estimate the cost of memory compaction. https://patchwork.kernel.org/patch/1624461/.Google Scholar
- About the Virtual Memory System. https://developer.apple.com/library/content/documentation/Performance/Conceptual/ManagingMemory/Articles/AboutMemory.html.Google Scholar
- FreeBSD Manual Pages. https://www.freebsd.org/cgi/man.cgi?query=uma&sektion=9.Google Scholar
- Intel Haswell. https://ark.intel.com/products/codename/42174/Haswell.Google Scholar
- Intel Skylake. https://ark.intel.com/products/codename/37572/Skylake.Google Scholar
- Jonathan Corbet. Memory compaction. https://lwn.net/Articles/368869/.Google Scholar
- Jonathan Corbet. Proactive compaction. https://lwn.net/Articles/717656/.Google Scholar
- Jonathan Corbet. Virtually mapped kernel stacks. https://lwn.net/Articles/692208/.Google Scholar
- khugepaged eating 100% cpu. https://bugzilla.redhat.com/show_bug.cgi?id=879801.Google Scholar
- Large-Page Support in Windows. https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs.85).aspx.Google Scholar
- Mapping physical memory directly. https://www.sceen.net/mapping-physical-memory-directly/.Google Scholar
- Mel Gorman. Huge pages part 1 (Introduction). https://lwn.net/Articles/374424/.Google Scholar
- MMTests: Benchmarking framework primarily aimed at linux kernel testing. https://github.com/gormanm/mmtests.Google Scholar
- Performance Tuning: HugePages In Linux. https://blog.pythian.com/performance-tuning-hugepages-in-linux/.Google Scholar
- pgbench. https://www.postgresql.org/docs/9.1/static/pgbench.html.Google Scholar
- Recommendation for disabling huge pages for Hadoop. http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Hadoop_Tuning_Guide-Version5.pdf.Google Scholar
- Recommendation for disabling huge pages for MongoDB. https://docs.mongodb.org/manual/ tutorial/transparent-huge-pages/.Google Scholar
- Recommendation for disabling huge pages for NuoDB. http://www.nuodb.com/techblog/linux-transparent-huge-pages-jemalloc-and-nuodb.Google Scholar
- Recommendation for disabling huge pages for Redis. http://redis.io/topics/latency.Google Scholar
- Recommendation for disabling huge pages for VoltDB. https://docs.voltdb.com/AdminGuide/adminmemmgt.php.Google Scholar
- Removal of lumpy reclaim. https://lwn.net/Articles/488993/.Google Scholar
- sysbench. https://dev.mysql.com/downloads/benchmarks.html.Google Scholar
- Tales from the Field: Taming Transparent Huge Pages on Linux. https://www.perforce.com/blog/151016/tales-field-taming-transparent-huge-pages-linux.Google Scholar
- The black magic of systematically reducing Linux OS jitter. http://highscalability.com/blog/2015/4/8/the-black-magic-of-systematically-reducing-linux-os-jitter.html.Google Scholar
- Why TokuDB Hates Transparent HugePages. http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Hadoop_Tuning_Guide-Version5.pdf.Google Scholar
- Alan Demers, Mark Weiser, Barry Hayes, Hans Boehm, Daniel Bobrow, and Scott Shenker. Combining generational and conservative garbage collection: Framework and implementations. In Proceedings of the 17th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '90, pages 261--269, New York, NY, USA, 1990. ACM. Google ScholarDigital Library
- Amro Awad, Arkaprava Basu, Sergey Blagodurov, Yan Solihin, and Gabriel H Loh. Avoiding TLB shootdowns through self-invalidating TLB entries. In 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2017, pages 273--287. IEEE, 2017.Google ScholarCross Ref
- Aravinda Prasad and K. Gopinath. Prudent memory reclamation in procrastination-based synchronization. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '16, pages 99--112, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. Efficient virtual memory for big memory servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA '13, pages 237--248, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- Ashish Panwar, Naman Patel, and K. Gopinath. A case for protecting huge pages from the kernel. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems, APSys '16, pages 15:1--15:8, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- Binh Pham, Ján Veselý, Gabriel H. Loh, and Abhishek Bhattacharjee. Large pages and lightweight memory management in virtualized environments: Can you have it both ways? In Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48, pages 1--12, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- Christian Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google ScholarDigital Library
- Daniel Bovet and Marco Cesati. Understanding The Linux Kernel. Oreilly & Associates Inc, 2005. Google ScholarDigital Library
- Darko Stefanović, Kathryn S McKinley, and J Eliot B Moss. Age-based garbage collection. ACM SIGPLAN Notices, 34(10):370--381, 1999. Google ScholarDigital Library
- Dipankar Sarma and Paul E. McKenney. Making RCU safe for deep sub-millisecond response realtime applications. In Proceedings of the 2004 USENIX Annual Technical Conference (FREENIX Track), ATC '04, pages 182--191, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarDigital Library
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmarks;summary and preliminary results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing '91, pages 158--165, New York, NY, USA, 1991. ACM. Google ScholarDigital Library
- Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. Large pages may be harmful on numa systems. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC '14, pages 231--242, Berkeley, CA, USA, 2014. USENIX Association. Google ScholarDigital Library
- Fei Guo, Seongbeom Kim, Yury Baskakov, and Ishan Banerjee. Proactively breaking large pages to improve memory overcommitment performance in vmware esxi. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '15, pages 39--51, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- Guy L. Steele, Jr. Multiprocessing compactifying garbage collection. Commun. ACM, 18(9):495--508, September 1975. Google ScholarDigital Library
- Hanna Alam, Tianhao Zhang, Mattan Erez, and Yoav Etsion. Do-it-yourself virtual memory translation. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA '17, pages 457--468, New York, NY, USA, 2017. ACM. Google ScholarDigital Library
- Heechul Yun, Renato Mancuso, Zheng Pei Wu, and Rodolfo Pellizzoni. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In 20th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2014, Berlin, Germany, April 15--17, 2014, pages 155--166, 2014.Google ScholarCross Ref
- Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Commun. ACM, 26(6):419--429, June 1983. Google ScholarDigital Library
- Ilya Lesokhin, Haggai Eran, Shachar Raindel, Guy Shapiro, Sagi Grimberg, Liran Liss, Muli Ben-Yehuda, Nadav Amit, and Dan Tsafrir. Page fault support for network controllers. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, pages 449--466, New York, NY, USA, 2017. ACM. Google ScholarDigital Library
- Indira Subramanian, Clifford Mather, Kurt Peterson, and Balakrishna Raghunath. Implementation of multiple pagesize support in hp-ux. In USENIX Annual Technical Conference, pages 105--119, 1998. Google ScholarDigital Library
- Irfan Habib. Virtualization with kvm. Linux J, 2008(166), February 2008. Google ScholarDigital Library
- Jayneel Gandhi, Arkaprava Basu, Mark D. Hill, and Michael M. Swift. Efficient memory virtualization: Reducing dimensionality of nested page walks. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-47, pages 178--189, Washington, DC, USA, 2014. IEEE Computer Society. Google ScholarDigital Library
- Jim Mauro and Richard McDougall. Solaris Internals (2nd Edition). Prentice Hall PTR, Upper Saddle River, NJ, USA, 2006. Google ScholarDigital Library
- John L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4):1--17, September 2006. Google ScholarDigital Library
- Juan Navarro, Sitaram Iyer, Peter Druschel, and Alan L. Cox. Practical, transparent operating system support for superpages. In 5th Symposium on Operating System Design and Implementation (OSDI 2002), Boston, Massachusetts, USA, December 9--11, 2002. Google ScholarDigital Library
- Katherine Barabash, Ori Ben-Yitzhak, Irit Goft, Elliot K. Kolodner, Victor Leikehman, Yoav Ossia, Avi Owshanko, and Erez Petrank. A parallel, incremental, mostly concurrent garbage collector for servers. ACM Trans. Program. Lang. Syst., 27(6):1097--1146, November 2005. Google ScholarDigital Library
- K. Albayraktaroglu, A. Jaleel, Xue Wu, M. Franklin, B. Jacob, Chau-Wen Tseng, and D. Yeung. Biobench: A benchmark suite of bioinformatics applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005, ISPASS '05, pages 2--9, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- Marshall Kirk McKusick, George Neville-Neil, and Robert N.M. Watson. The Design and Implementation of the FreeBSD Operating System. Addison-Wesley Professional, 2nd edition, 2014. Google ScholarDigital Library
- Mel Gorman and Patrick Healy. Performance characteristics of explicit superpage support. In Proceedings of the 2010 International Conference on Computer Architecture, ISCA '10, pages 293--310, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarDigital Library
- Mel Gorman and Patrick Healy. Supporting superpage allocation without additional hardware support. In Proceedings of the 7th International Symposium on Memory Management, ISMM '08, pages 41--50, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Mel Gorman and Andy Whitcroft. The what, the why and the where to of anti-fragmentation. In Linux Symposium, page 369--384, 2006.Google Scholar
- Mel Gorman and Andy Whitcroft. Supporting the allocation of large contiguous regions of memory. In Linux Symposium, page 141--152, 2007.Google Scholar
- Nadav Amit. Optimizing the TLB shootdown algorithm with page access tracking. In Proceedings of the 2017 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC '17, pages 27--39, Santa Clara, CA, USA, 2017. USENIX Association. Google ScholarDigital Library
- Paul E McKenney, Dipankar Sarma, Ingo Molnar, and Suparna Bhattacharya. Extending RCU for realtime and embedded workloads. In Ottawa Linux Symposium, pages v2, pages 123--138. Citeseer, 2006.Google Scholar
- Sang-Hoon Kim, Sejun Kwon, Jin-Soo Kim, and Jinkyu Jeong. Controlling physical memory fragmentation in mobile systems. In Proceedings of the 2015 International Symposium on Memory Management, ISMM '15, pages 1--14, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- Tamar Domani, Elliot K. Kolodner, and Erez Petrank. A generational on-the-fly garbage collector for java. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pages 274--284, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- Timothy Merrifield and H. Reza Taheri. Performance implications of extended page tables on virtualized x86 processors. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '16, pages 25--35, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- Tudor-Ioan Salomie, Gustavo Alonso, Timothy Roscoe, and Kevin Elphinstone. Application level ballooning for efficient server consolidation. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 337--350, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. Coordinated and efficient huge page management with ingens. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 705--721, GA, 2016. USENIX Association. Google ScholarDigital Library
Index Terms
- Making Huge Pages Actually Useful
Recommendations
Making Huge Pages Actually Useful
ASPLOS '18The virtual-to-physical address translation overhead, a major performance bottleneck for modern workloads, can be effectively alleviated with huge pages. However, since huge pages must be mapped contiguously, OSs have not been able to use them well ...
A Case for Protecting Huge Pages from the Kernel
APSys '16: Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on SystemsControlling memory fragmentation is critical for leveraging the benefits of huge page support offered by modern architectures. The division of free memory into non-contiguous regions over time restricts huge page allocations in long run. Compaction is a ...
Hot-LSNs distributing wear-leveling algorithm for flash memory
Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systemsFlash memory offers attractive features, such as non-volatile, shock resistance, fast access and low power consumption for data storage. However, it has one main drawback of requiring an erase before updating the contents. Furthermore, the flash memory ...
Comments