research-article

MapReduce optimization using regulated dynamic prioritization

Authors:
Thomas Sandholm

Hewlett-Packard Laboratories, Palo Alto, CA, USA

Hewlett-Packard Laboratories, Palo Alto, CA, USA
View Profile

,
Kevin Lai

Hewlett-Packard Laboratories, Palo Alto, CA, USA

Hewlett-Packard Laboratories, Palo Alto, CA, USA
View Profile

SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systemsJune 2009Pages 299–310https://doi.org/10.1145/1555349.1555384

Published:15 June 2009Publication History

SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems

Pages 299–310

ABSTRACT

We present a system for allocating resources in shared data and compute clusters that improves MapReduce job scheduling in three ways. First, the system uses regulated and user-assigned priorities to offer different service levels to jobs and users over time. Second, the system dynamically adjusts resource allocations to fit the requirements of different job stages. Finally, the system automatically detects and eliminates bottlenecks within a job. We show experimentally using real applications that users can optimize not only job execution time but also the cost-benefit ratio or prioritization efficiency of a job using these three strategies. Our approach relies on a proportional share mechanism that continuously allocates virtual machine resources. Our experimental results show a 11-31% improvement in completion time and 4-187% improvement in prioritization efficiency for different classes of MapReduce jobs. We further show that delay intolerant users gain even more from our system.

References

K. Arrow. Aspects of the theory of risk-bearing. Helsinki: Yrjo Jahnsson Lectures, 1965.Google Scholar
A. AuYoung, L. Grit, J. Wiener, and J. Wilkes. Service contracts and aggregate utility functions. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC), June 2006.Google ScholarCross Ref
R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In ACM SIGMOD: International Conference on Management of Data, 2007. Google ScholarDigital Library
P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proceedings of the ACM Symposium on Operating Systems Principles, 2003. Google ScholarDigital Library
R.E. Bryant. Data-intensive supercomputing: The case for DISC. Technical Report CMU-CS-07-128, Carnegie Mellon University, 2007.Google Scholar
K. Cardona, J. Secretan, M. Georgiopoulos, and G. Anagnostopoulos. A grid based system for data mining using MapReduce. Technical Report TR-2007-02, AMALTHEA, 2007.Google Scholar
B.N. Chun, P. Buonadonna, A. AuYoung, C. Ng, D.C. Parkes, J. Shneidman, A.C. Snoeren, and A. Vahdat. Mirage: A microeconomic resource allocation system for SensorNet testbeds. In Proceedings of the 2nd IEEE Workshop on Embedded Networked Sensors, 2005. Google ScholarDigital Library
B.N. Chun and D.E. Culler. Market-based proportional resource sharing for clusters. Technical Report CSD-1092, University of California at Berkeley, Computer Science Division, January 2000. Google ScholarDigital Library
B.N. Chun and D.E. Culler. User-centric performance analysis of market-based cluster batch schedulers. In Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid, 2002. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Symposium on Operating System Design and Implementation, 2004. Google ScholarDigital Library
M. Feldman, K. Lai, and L. Zhang. A price-anticipating resource allocation mechanism for distributed shared clusters. In Proceedings of the ACM Conference on Electronic Commerce, 2005. Google ScholarDigital Library
http://hadoop.apache.org/core, 2008.Google Scholar
G. Hardin. The tragedy of the commons. Science, 162:1243--1248, 1968.Google ScholarCross Ref
B. He, W. Fang, Q. Luo, N.K. Govindaraju, and T. Wang. Mars: a MapReduce framework on graphics processors. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 260--269, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
D. Irwin, J. Chase, and L. Grit. Balancing risk and reward in market-based task scheduling. In International Symposium on High Performance Distributed Computing, 2004. Google ScholarDigital Library
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pages 59--72, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
P.R. Jelenkovic, X. Kang, and J. Tan. Adaptive and scalable comparison scheduling. In ACM SIGMETRICS'07: International Conference on Measurement and Modeling of Computer Systems, pages 215--226, 2007. Google ScholarDigital Library
E. Jensen, C. Locke, and H. Tokuda. A time-driven scheduling model for real-time operating systems. In IEEE Real-Time Systems Symposium , pages 112--122, 1985.Google Scholar
K. Lai, L. Rasmusson, E. Adar, S. Sorkin, L. Zhang, and B.A. Huberman. Tycoon: an implemention of a distributed market-based resource allocation system. Multiagent and Grid Systems, 1(3):169--182, Aug. 2005. Google ScholarDigital Library
D. Logothetis and K. Yocum. Ad-hoc data processing in the cloud. Proc. VLDB Endow., 1(2):1472--1475, 2008. Google ScholarDigital Library
N. Moroney, P. Obrador, and G. Beretta. Lexical image processing. In Proceedings of the 16th IS&T/SID Color Imaging Conference, pages 268--273, 2008.Google Scholar
C. Olston. Pig: Web-scale processing. http://www.cs.cmu.edu/~olston/pig.ppt, 2008.Google Scholar
C. Olston, B. Reed, A. Silberstein, and U. Srivastava. Automatic optimization of parallel dataflow programs. In USENIX Annual Technical Conference, 2008. Google ScholarDigital Library
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: A not-so-foreign language for data processing. In SIGMOD 2008: ACM SIGMOD/PODS Conference, 2008. Google ScholarDigital Library
C.H. Papadimitriou. Algorithms, games, and the Internet. In Symposium on Theory of Computing, 2001. Google ScholarDigital Library
L. Peterson, T. Anderson, D. Culler, and T. Roscoe. Blueprint for Introducing Disruptive Technology into the Internet. In First Workshop on Hot Topics in Networking, 2002.Google Scholar
R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming Journal Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure, 13(4):227--298, 2003. Google ScholarDigital Library
F.I. Popovici and J. Wilkes. Profitable services in an uncertain world. In SC05: Proceedings of Supercomputing, 2005. Google ScholarDigital Library
J. Pratt. Risk aversion in the small and in the large. Econometrica, 32:122--136, 1964.Google ScholarCross Ref
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. In HPCA'07: IEEE 13th International Symposium on High Performance Computer Architecture , pages 13--24, 2007. Google ScholarDigital Library
T. Sandholm. Statistical methods for computational markets.Doctoral Thesis ISRN SU-KTH/DSV/R-08/6-SE. Royal Institute of Technology, Stockholm, 2008.Google Scholar
T. Sandholm and K. Lai. A statistical approach to risk mitigation in computational markets. In Proceedings of the ACM International Symposium on High Performance Distributed Computing (HPDC), June 2007. Google ScholarDigital Library
T. Sandholm, K. Lai, J. Andrade, and J. Odeberg. Market-based resource allocation using price prediction in a high performance computing grid for scientific applications. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC), June 2006.Google ScholarCross Ref
P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie, and T.G. Price. Access path selection in a relational database management system. In SIGMOD 1979: ACM SIGMOD International Conference on the Management of Data, 1979. Google ScholarDigital Library
M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, and M.J. Franklin. Flux: An adaptive partitioning operator for continuous query systems. Technical Report UCB/CSD-2-1205, U. C. Berkley, 2002. Google ScholarDigital Library
M. Stonebraker, P.M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: a wide-area distributed database system. The VLDB Journal, 5(1):048--063, 1996. Google ScholarDigital Library
S.V. Valvag and D. Johansen. Oivos: Simple and efficient distributed data processing. High Performance Computing and Communications, 2008. HPCC '08. 10th IEEE International Conference on, pages 113--122, Sept. 2008. Google ScholarDigital Library
M. Wachs, M. Abd-El-Malek, E. Thereska, and G.R. Ganger. Argon: performance insulation for shared storage servers. In FAST'07: 5th USENIX Conference on File and Storage Technologies, 2007. Google ScholarDigital Library
C.A. Waldspurger and W.E. Weihl. Lottery scheduling: Flexible proportional-share resource management. In Operating Systems Design and Implementation, pages 1--11, 1994. Google ScholarDigital Library
A. Wierman and M. Nuyens. Scheduling despite inexact job-size information. In ACM SIGMETRICS'08: International Conference on Measurement and Modeling of Computer Systems, pages 25--36, 2008. Google ScholarDigital Library
J. Wolfe, A. Haghighi, and D. Klein. Fully distributed EM for very large datasets. In ICML '08: Proceedings of the 25th international conference on Machine learning, pages 1184--1191, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In OSDI'08: 8th USENIX Symposium on Operating Systems Design and Implementation, 2008. Google ScholarDigital Library
L. Zhang. The efficiency and fairness of a fixed budget resource allocation game. In International Colloquium on Automata, Languages and Programming, pages 485--496, 2005. Google ScholarDigital Library

Index Terms

MapReduce optimization using regulated dynamic prioritization

Recommendations

MapReduce optimization using regulated dynamic prioritization
SIGMETRICS '09

We present a system for allocating resources in shared data and compute clusters that improves MapReduce job scheduling in three ways. First, the system uses regulated and user-assigned priorities to offer different service levels to jobs and users over ...
Read More
Multi-policy-aware MapReduce resource allocation and scheduling for smart computing cluster

When a user submit a MapReduce job in the smart computing cluster, we first need to allocate cluster resource for the job. It is widely concerned that how to save time and resource costs to provide users with computing capacity and services. Here, we ...
Read More
ARIA: automatic resource inference and allocation for mapreduce environments
ICAC '11: Proceedings of the 8th ACM international conference on Autonomic computing

MapReduce and Hadoop represent an economically compelling alternative for efficient large scale data processing and advanced analytics in the enterprise. A key challenge in shared MapReduce clusters is the ability to automatically tailor and control ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
June 2009
336 pages
ISBN:9781605585116
DOI:10.1145/1555349
General Chairs:
John Douceur
Microsoft Research, USA
,
Albert Greenberg
Microsoft Research, USA
,
Program Chairs:
Thomas Bonald
Orange Labs, France
,
Jason Nieh
Columbia University, USA
ACM SIGMETRICS Performance Evaluation Review Volume 37, Issue 1
SIGMETRICS '09
June 2009
320 pages
ISSN:0163-5999
DOI:10.1145/2492101
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
disc
mapreduce
proportional share
resource allocation
workflow optimization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate459of2,691submissions,17%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 105
  Total Citations
  View Citations
- 1,891
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MapReduce optimization using regulated dynamic prioritization

SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

MapReduce optimization using regulated dynamic prioritization

Multi-policy-aware MapReduce resource allocation and scheduling for smart computing cluster

ARIA: automatic resource inference and allocation for mapreduce environments