Hoard: a scalable memory allocator for multithreaded applications

Authors:
Emery D. Berger

Department of Computer Sciences, The University of Texas at Austin, Austin, Texas

Department of Computer Sciences, The University of Texas at Austin, Austin, Texas
View Profile

,
Kathryn S. McKinley

Department of Computer Science, University of Massachusetts, Amherst, Massachusetts

Department of Computer Science, University of Massachusetts, Amherst, Massachusetts
View Profile

,
Robert D. Blumofe

Department of Computer Sciences, The University of Texas at Austin, Austin, Texas

Department of Computer Sciences, The University of Texas at Austin, Austin, Texas
View Profile

,
Paul R. Wilson

Department of Computer Sciences, The University of Texas at Austin, Austin, Texas

Department of Computer Sciences, The University of Texas at Austin, Austin, Texas
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 28 Issue 5Dec. 2000pp 117–128https://doi.org/10.1145/378995.379232

Published:12 November 2000Publication History

ACM SIGARCH Computer Architecture News

Abstract

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory efficient. Hoard is the first allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.

References

1 U. Acar, E. Berger, R. Blumofe, and D. Papadopoulos. Hood: A threads library for multiprogrammed multiprocessors. http://www.cs.utexas.edu/users/hood, Sept. 1999.Google Scholar
2 J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446-449, 1986.Google ScholarCross Ref
3 bCandid.com, Inc. http://www.bcandid.com.Google Scholar
4 E. D. Berger and R. D. Blumofe. Hoard: A fast, scalable, and memory-efficient allocator for shared-memory multiprocessors. Technical Report UTCS-TR99-22, The University of Texas at Austin, 1999. Google ScholarDigital Library
5 B. Bigler, S. Allan, and R. Oldehoeft. Parallel dynamic storage allocation. International Conference on Parallel Processing, pages 272-275, 1985.Google Scholar
6 R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS), pages 356-368, Santa Fe, New Mexico, Nov. 1994.Google ScholarDigital Library
7 Coyote Systems, Inc. http://www.coyotesystems.com.Google Scholar
8 C. S. Ellis and T. J. Olson. Algorithms for parallel memory allocation. International Journal of Parallel Programming, 17(4):303-345, 1988. Google ScholarDigital Library
9 W. Gloger. Dynamic memory allocator implementations in linux system libraries. http://www.dent.med.uni-muenchen.de/ wmglo/malloc-slides.html.Google Scholar
10 A. Gottlieb and J. Wilson. Using the buddy system for concurrent memory allocation. Technical Report System Software Note 6, Courant Institute, 1981.Google Scholar
11 A. Gottlieb and J. Wilson. Parallelizing the usual buddy algorithm. Technical Report System Software Note 37, Courant Institute, 1982.Google Scholar
12 D. Grunwald, B. Zorn, and R. Henderson. Improving the cache locality of memory allocation. In R. Cartwright, editor, Proceedings of the Conference on Programming Language Design and Implementation, pages 177-186, New York, NY, USA, June 1993. ACM Press. Google ScholarDigital Library
13 A. K. Iyengar. Dynamic Storage Allocation on a Multiprocessor. PhD thesis, MIT, 1992. MIT Laboratory for Computer Science Technical Report MIT/LCS/TR-560. Google ScholarDigital Library
14 A. K. Iyengar. Parallel dynamic storage allocation algorithms. In Fifth IEEE Symposium on Parallel and Distributed Processing. IEEE Press, 1993.Google ScholarDigital Library
15 T. Jeremiassen and S. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In ACM Symposium on Principles and Practice of Parallel Programming, pages 179-188, July 1995. Google ScholarDigital Library
16 T. Johnson. A concurrent fast-fits memory manager. Technical Report TR91-009, University of Florida, Department of CIS, 1991.Google Scholar
17 T. Johnson and T. Davis. Space efficient parallel buddy memory management. Technical Report TR92-008, University of Florida, Department of CIS, 1992.Google ScholarCross Ref
18 M. S. Johnstone. Non-Compacting Memory Allocation and Real-Time Garbage Collection. PhD thesis, University of Texas at Austin, Dec. 1997. Google ScholarDigital Library
19 M. S. Johnstone and P. R. Wilson. The memory fragmentation problem: Solved? In ISMM, Vancouver, B.C., Canada, 1998. Google ScholarDigital Library
20 K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the Sixth International Conference on Supercomputing, pages 323-334, Distributed Computing, July 1992. Google ScholarDigital Library
21 M. R. Krishnan. Heap: Pleasures and pains. Microsoft Developer Newsletter, Feb. 1999.Google Scholar
22 P. Larson and M. Krishnan. Memory allocation for long-running server applications. In ISMM, Vancouver, B.C., Canada, 1998. Google ScholarDigital Library
23 D. Lea. A memory allocator. http://g.oswego.edu/dl/html/malloc.html.Google Scholar
24 B. Lewis. comp.programming.threads FAQ. http://www.lambdacs.com/newsgroup/FAQ.html.Google Scholar
25 P. E. McKenney and J. Slingwine. Efficient kernel memory allocation on shared-memory multiprocessor. In USENIX Association, editor, Proceedings of the Winter 1993 USENIX Conference: January 25-29, 1993, San Diego, California, USA, pages 295-305, Berkeley, CA, USA, Winter 1993. USENIX.Google Scholar
26 MicroQuill, Inc. http://www.microquill.com.Google Scholar
27 MySQL, Inc. The mysql database manager. http://www.mysql.org.Google Scholar
28 G. J. Narlikar and G. E. Blelloch. Space-efficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21(1):138-173, January 1999. Google ScholarDigital Library
29 J. M. Robson. Worst case fragmentation of first fit and best fit storage allocation strategies. ACM Computer Journal, 20(3):242-244, Aug. 1977.Google ScholarCross Ref
30 SGI. The standard template library for c++: Allocators. http://www.sgi.com/Technology/STL/Allocators.html.Google Scholar
31 Standard Performance Evaluation Corporation. SPECweb99. http://www.spec.org/osg/web99/.Google Scholar
32 D. Stefanovi' c. Properties of Age-Based Automatic Memory Reclamation Algorithms. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts, Dec. 1998. Google ScholarDigital Library
33 D. Stein and D. Shah. Implementing lightweight threads. In Proceedings of the 1992 USENIX Summer Conference, pages 1-9, 1992.Google Scholar
34 H. Stone. Parallel memory allocation using the FETCH-AND-ADD instruction. Technical Report RC 9674, IBM T. J. Watson Research Center, Nov. 1982.Google Scholar
35 Time-Warner/AOL, Inc. AOLserver 3.0. http://www.aolserver.com.Google Scholar
36 J. Torrellas, M. S. Lam, and J. L. Hennessy. False sharing and spatial locality in multiprocessor caches. IEEE Transactions on Computers, 43(6):651-663, 1994. Google ScholarDigital Library
37 V.-Y. Vee and W.-J. Hsu. A scalable and efficient storage allocator on shared-memory multiprocessors. In International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99), pages 230-235, Fremantle, Western Australia, June 1999. Google ScholarDigital Library

Index Terms

Hoard: a scalable memory allocator for multithreaded applications

Recommendations

Hoard: a scalable memory allocator for multithreaded applications

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits ...
Read More
Hoard: a scalable memory allocator for multithreaded applications

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits ...
Read More
Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 28, Issue 5
Special Issue: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (ASPLOS '00)
Dec. 2000
269 pages
ISSN:0163-5964
DOI:10.1145/378995
Editor:
Doug DeGroot
Dallas, TX
Issue’s Table of Contents
ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
November 2000
271 pages
ISBN:1581133170
DOI:10.1145/378993
Chairmen:
Larry Rudolph
MIT, Cambridge, MA
,
Anoop Gupta
Microsoft
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2000
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 308
  Total Citations
  View Citations
- 2,439
  Total Downloads
- Downloads (Last 12 months)347
- Downloads (Last 6 weeks)43
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hoard: a scalable memory allocator for multithreaded applications

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Hoard: a scalable memory allocator for multithreaded applications

Hoard: a scalable memory allocator for multithreaded applications

Hoard: a scalable memory allocator for multithreaded applications