skip to main content
article
Free Access

Hoard: a scalable memory allocator for multithreaded applications

Authors Info & Claims
Published:12 November 2000Publication History
Skip Abstract Section

Abstract

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory efficient. Hoard is the first allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.

References

  1. 1 U. Acar, E. Berger, R. Blumofe, and D. Papadopoulos. Hood: A threads library for multiprogrammed multiprocessors. http://www.cs.utexas.edu/users/hood, Sept. 1999.Google ScholarGoogle Scholar
  2. 2 J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446-449, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3 bCandid.com, Inc. http://www.bcandid.com.Google ScholarGoogle Scholar
  4. 4 E. D. Berger and R. D. Blumofe. Hoard: A fast, scalable, and memory-efficient allocator for shared-memory multiprocessors. Technical Report UTCS-TR99-22, The University of Texas at Austin, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 B. Bigler, S. Allan, and R. Oldehoeft. Parallel dynamic storage allocation. International Conference on Parallel Processing, pages 272-275, 1985.Google ScholarGoogle Scholar
  6. 6 R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS), pages 356-368, Santa Fe, New Mexico, Nov. 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 Coyote Systems, Inc. http://www.coyotesystems.com.Google ScholarGoogle Scholar
  8. 8 C. S. Ellis and T. J. Olson. Algorithms for parallel memory allocation. International Journal of Parallel Programming, 17(4):303-345, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 W. Gloger. Dynamic memory allocator implementations in linux system libraries. http://www.dent.med.uni-muenchen.de/ wmglo/malloc-slides.html.Google ScholarGoogle Scholar
  10. 10 A. Gottlieb and J. Wilson. Using the buddy system for concurrent memory allocation. Technical Report System Software Note 6, Courant Institute, 1981.Google ScholarGoogle Scholar
  11. 11 A. Gottlieb and J. Wilson. Parallelizing the usual buddy algorithm. Technical Report System Software Note 37, Courant Institute, 1982.Google ScholarGoogle Scholar
  12. 12 D. Grunwald, B. Zorn, and R. Henderson. Improving the cache locality of memory allocation. In R. Cartwright, editor, Proceedings of the Conference on Programming Language Design and Implementation, pages 177-186, New York, NY, USA, June 1993. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 A. K. Iyengar. Dynamic Storage Allocation on a Multiprocessor. PhD thesis, MIT, 1992. MIT Laboratory for Computer Science Technical Report MIT/LCS/TR-560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 A. K. Iyengar. Parallel dynamic storage allocation algorithms. In Fifth IEEE Symposium on Parallel and Distributed Processing. IEEE Press, 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 T. Jeremiassen and S. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In ACM Symposium on Principles and Practice of Parallel Programming, pages 179-188, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16 T. Johnson. A concurrent fast-fits memory manager. Technical Report TR91-009, University of Florida, Department of CIS, 1991.Google ScholarGoogle Scholar
  17. 17 T. Johnson and T. Davis. Space efficient parallel buddy memory management. Technical Report TR92-008, University of Florida, Department of CIS, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  18. 18 M. S. Johnstone. Non-Compacting Memory Allocation and Real-Time Garbage Collection. PhD thesis, University of Texas at Austin, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 M. S. Johnstone and P. R. Wilson. The memory fragmentation problem: Solved? In ISMM, Vancouver, B.C., Canada, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the Sixth International Conference on Supercomputing, pages 323-334, Distributed Computing, July 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 M. R. Krishnan. Heap: Pleasures and pains. Microsoft Developer Newsletter, Feb. 1999.Google ScholarGoogle Scholar
  22. 22 P. Larson and M. Krishnan. Memory allocation for long-running server applications. In ISMM, Vancouver, B.C., Canada, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 D. Lea. A memory allocator. http://g.oswego.edu/dl/html/malloc.html.Google ScholarGoogle Scholar
  24. 24 B. Lewis. comp.programming.threads FAQ. http://www.lambdacs.com/newsgroup/FAQ.html.Google ScholarGoogle Scholar
  25. 25 P. E. McKenney and J. Slingwine. Efficient kernel memory allocation on shared-memory multiprocessor. In USENIX Association, editor, Proceedings of the Winter 1993 USENIX Conference: January 25-29, 1993, San Diego, California, USA, pages 295-305, Berkeley, CA, USA, Winter 1993. USENIX.Google ScholarGoogle Scholar
  26. 26 MicroQuill, Inc. http://www.microquill.com.Google ScholarGoogle Scholar
  27. 27 MySQL, Inc. The mysql database manager. http://www.mysql.org.Google ScholarGoogle Scholar
  28. 28 G. J. Narlikar and G. E. Blelloch. Space-efficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21(1):138-173, January 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29 J. M. Robson. Worst case fragmentation of first fit and best fit storage allocation strategies. ACM Computer Journal, 20(3):242-244, Aug. 1977.Google ScholarGoogle ScholarCross RefCross Ref
  30. 30 SGI. The standard template library for c++: Allocators. http://www.sgi.com/Technology/STL/Allocators.html.Google ScholarGoogle Scholar
  31. 31 Standard Performance Evaluation Corporation. SPECweb99. http://www.spec.org/osg/web99/.Google ScholarGoogle Scholar
  32. 32 D. Stefanovi' c. Properties of Age-Based Automatic Memory Reclamation Algorithms. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts, Dec. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33 D. Stein and D. Shah. Implementing lightweight threads. In Proceedings of the 1992 USENIX Summer Conference, pages 1-9, 1992.Google ScholarGoogle Scholar
  34. 34 H. Stone. Parallel memory allocation using the FETCH-AND-ADD instruction. Technical Report RC 9674, IBM T. J. Watson Research Center, Nov. 1982.Google ScholarGoogle Scholar
  35. 35 Time-Warner/AOL, Inc. AOLserver 3.0. http://www.aolserver.com.Google ScholarGoogle Scholar
  36. 36 J. Torrellas, M. S. Lam, and J. L. Hennessy. False sharing and spatial locality in multiprocessor caches. IEEE Transactions on Computers, 43(6):651-663, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37 V.-Y. Vee and W.-J. Hsu. A scalable and efficient storage allocator on shared-memory multiprocessors. In International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99), pages 230-235, Fremantle, Western Australia, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hoard: a scalable memory allocator for multithreaded applications

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image ACM SIGARCH Computer Architecture News
                      ACM SIGARCH Computer Architecture News  Volume 28, Issue 5
                      Special Issue: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (ASPLOS '00)
                      Dec. 2000
                      269 pages
                      ISSN:0163-5964
                      DOI:10.1145/378995
                      Issue’s Table of Contents
                      • cover image ACM Conferences
                        ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
                        November 2000
                        271 pages
                        ISBN:1581133170
                        DOI:10.1145/378993

                      Copyright © 2000 ACM

                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 12 November 2000

                      Check for updates

                      Qualifiers

                      • article

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader