research-article

Panacea: towards holistic optimization of MapReduce applications

Authors:
Jun Liu

Pennsylvania State University

Pennsylvania State University
View Profile

,
Nishkam Ravi

NEC Laboratories America, Princeton, NJ

NEC Laboratories America, Princeton, NJ
View Profile

,
Srimat Chakradhar

NEC Laboratories America, Princeton, NJ

NEC Laboratories America, Princeton, NJ
View Profile

,
Mahmut Kandemir

Pennsylvania State University

Pennsylvania State University
View Profile

CGO '12: Proceedings of the Tenth International Symposium on Code Generation and OptimizationMarch 2012Pages 33–43https://doi.org/10.1145/2259016.2259022

Published:31 March 2012Publication History

CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization

Pages 33–43

ABSTRACT

MapReduce has emerged as one of the most popular programming models for data parallel enterprise applications. Despite advances in runtime, the opportunities for optimizing MapReduce applications remain largely unexplored. In this paper, we present a framework for performing holistic compiler optimizations on legacy MapReduce applications. We have identified and implemented two optimizations and evaluated them with a set of Hadoop applications on a cluster of Xeon servers. Our experiments show that performance gains of more than 3X can be achieved without user involvement.

References

Apache hive. http://hive.apache.org.Google Scholar
Apache pig. http://pig.apache.org.Google Scholar
Cloudera hadoop. http://www.cloudera.com.Google Scholar
Hadoop mapreduce. http://hadoop.apache.org.Google Scholar
Openmp parallel programming. http://openmp.org.Google Scholar
Soot: A java optimization framework. http://www.sable.mcgill.ca.Google Scholar
X-rime: Hadoop based social network analysis. http://xrime.sourceforge.net.Google Scholar
Yahoo! launches largest hadoop production app. http://developer.yahoo.com/blogs.Google Scholar
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 295--305, 2006. Google ScholarDigital Library
J. Ansel, Y. L. W. ans Cy Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for auto-tuning variable-accuracy algorithms. In International Symposium on Code Generation and Optimization (CGO), 2011. Google ScholarDigital Library
L. E. Baum and T. Petrie. Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics, 37:1554--1563, 1966.Google ScholarCross Ref
S. T. Chakradhar and A. Raghunathan. Best-effort computing: re-thinking parallel software and hardware. In Proceedings of the 47th Design Automation Conference (DAC), pages 865--870, 2010. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation (OSDI), 2004. Google ScholarDigital Library
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--38, 1977.Google Scholar
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC), pages 810--818, 2010. Google ScholarDigital Library
T. S. Ferguson. A bayesian analysis of some nonparametric problems. Ann. Statist., 1:209--230, 1973.Google ScholarCross Ref
I. Fodor. A survey of dimension reduction techniques. Technical report, 2002.Google Scholar
H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic knobs for responsive power-aware computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 199--212, 2011. Google ScholarDigital Library
K. Kambatla, A. Pathak, and H. Pucha. Towards optimizing hadoop provisioning in the cloud. In Proceedings of the Conference on Hot topics in cloud computing (HotCloud), 2009. Google ScholarDigital Library
K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama. Asynchronous algorithms in mapreduce. In Proceedings of the 2010 IEEE International Conference on Cluster Computing (CLUSTER), 2010. Google ScholarDigital Library
D. Kim, L. Renganarayanan, D. Rostron, S. Rajopadhye, and M. M. Strout. Multi-level tiling: M for the price of one. In Proceedings of the ACM/IEEE conference on Supercomputing (ICS), 2007. Google ScholarDigital Library
T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT), 2000. Google ScholarDigital Library
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46:604--632, September 1999. Google ScholarDigital Library
R. Lämmel. Google's mapreduce programming model revisited. Sci. Comput. Program., 68:208--237, October 2007. Google ScholarDigital Library
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281--297, 1967.Google Scholar
A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), pages 169--178, 2000. Google ScholarDigital Library
M. Méndez-Lojo, D. Nguyen, D. Prountzos, X. Sui, M. A. Hassaan, M. Kulkarni, M. Burtscher, and K. Pingali. Structure-driven optimizations for amorphous data-parallel programs. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 3--14, 2010. Google ScholarDigital Library
P. Müller and D. R. Insua. Issues in bayesian analysis of neural network models. Neural Comput., 10:749--770, April 1998. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.Google Scholar
K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2:559--572, 1901.Google ScholarCross Ref
K. Rajan, S. Rajamani, and S. Yaduvanshi. Guesstimate: a programming model for collaborative distributed systems. In Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation (PLDI), pages 210--220, 2010. Google ScholarDigital Library
T. Sandholm and K. Lai. Mapreduce optimization using regulated dynamic prioritization. In Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), pages 299--310, 2009. Google ScholarDigital Library
S. Seidman. Network structure and minimum degree. Social Networks 5, pages 269--287, 1983.Google Scholar
A. Tiwari, C. Chen, C. Jacqueline, M. Hall, and J. K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pages 1--12, 2009. Google ScholarDigital Library
A. Udupa, K. Rajan, and W. Thies. Alter: exploiting breakable dependences for parallelization. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation (PLDI), pages 480--491, 2011. Google ScholarDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization
March 2012
285 pages
ISBN:9781450312066
DOI:10.1145/2259016
General Chairs:
Carol Eidt
Microsoft
,
Anne Holler
VMware
,
Program Chairs:
Uma Srinivasan
Intel
,
Saman Amarasinghe
MIT
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 March 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
CGO '12 Paper Acceptance Rate26of90submissions,29%Overall Acceptance Rate312of1,061submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 333
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Panacea: towards holistic optimization of MapReduce applications

CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Recommendations

Big Data Analytics

Big Data Analytics with R and Hadoop

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools