skip to main content
10.1145/2259016.2259022acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Panacea: towards holistic optimization of MapReduce applications

Published:31 March 2012Publication History

ABSTRACT

MapReduce has emerged as one of the most popular programming models for data parallel enterprise applications. Despite advances in runtime, the opportunities for optimizing MapReduce applications remain largely unexplored. In this paper, we present a framework for performing holistic compiler optimizations on legacy MapReduce applications. We have identified and implemented two optimizations and evaluated them with a set of Hadoop applications on a cluster of Xeon servers. Our experiments show that performance gains of more than 3X can be achieved without user involvement.

References

  1. Apache hive. http://hive.apache.org.Google ScholarGoogle Scholar
  2. Apache pig. http://pig.apache.org.Google ScholarGoogle Scholar
  3. Cloudera hadoop. http://www.cloudera.com.Google ScholarGoogle Scholar
  4. Hadoop mapreduce. http://hadoop.apache.org.Google ScholarGoogle Scholar
  5. Openmp parallel programming. http://openmp.org.Google ScholarGoogle Scholar
  6. Soot: A java optimization framework. http://www.sable.mcgill.ca.Google ScholarGoogle Scholar
  7. X-rime: Hadoop based social network analysis. http://xrime.sourceforge.net.Google ScholarGoogle Scholar
  8. Yahoo! launches largest hadoop production app. http://developer.yahoo.com/blogs.Google ScholarGoogle Scholar
  9. F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 295--305, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Ansel, Y. L. W. ans Cy Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for auto-tuning variable-accuracy algorithms. In International Symposium on Code Generation and Optimization (CGO), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. E. Baum and T. Petrie. Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics, 37:1554--1563, 1966.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. T. Chakradhar and A. Raghunathan. Best-effort computing: re-thinking parallel software and hardware. In Proceedings of the 47th Design Automation Conference (DAC), pages 865--870, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation (OSDI), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--38, 1977.Google ScholarGoogle Scholar
  15. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC), pages 810--818, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. S. Ferguson. A bayesian analysis of some nonparametric problems. Ann. Statist., 1:209--230, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  17. I. Fodor. A survey of dimension reduction techniques. Technical report, 2002.Google ScholarGoogle Scholar
  18. H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic knobs for responsive power-aware computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 199--212, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Kambatla, A. Pathak, and H. Pucha. Towards optimizing hadoop provisioning in the cloud. In Proceedings of the Conference on Hot topics in cloud computing (HotCloud), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama. Asynchronous algorithms in mapreduce. In Proceedings of the 2010 IEEE International Conference on Cluster Computing (CLUSTER), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Kim, L. Renganarayanan, D. Rostron, S. Rajopadhye, and M. M. Strout. Multi-level tiling: M for the price of one. In Proceedings of the ACM/IEEE conference on Supercomputing (ICS), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46:604--632, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Lämmel. Google's mapreduce programming model revisited. Sci. Comput. Program., 68:208--237, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281--297, 1967.Google ScholarGoogle Scholar
  26. A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), pages 169--178, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Méndez-Lojo, D. Nguyen, D. Prountzos, X. Sui, M. A. Hassaan, M. Kulkarni, M. Burtscher, and K. Pingali. Structure-driven optimizations for amorphous data-parallel programs. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 3--14, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Müller and D. R. Insua. Issues in bayesian analysis of neural network models. Neural Comput., 10:749--770, April 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.Google ScholarGoogle Scholar
  30. K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2:559--572, 1901.Google ScholarGoogle ScholarCross RefCross Ref
  31. K. Rajan, S. Rajamani, and S. Yaduvanshi. Guesstimate: a programming model for collaborative distributed systems. In Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation (PLDI), pages 210--220, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Sandholm and K. Lai. Mapreduce optimization using regulated dynamic prioritization. In Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), pages 299--310, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Seidman. Network structure and minimum degree. Social Networks 5, pages 269--287, 1983.Google ScholarGoogle Scholar
  34. A. Tiwari, C. Chen, C. Jacqueline, M. Hall, and J. K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pages 1--12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Udupa, K. Rajan, and W. Thies. Alter: exploiting breakable dependences for parallelization. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation (PLDI), pages 480--491, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization
    March 2012
    285 pages
    ISBN:9781450312066
    DOI:10.1145/2259016

    Copyright © 2012 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 31 March 2012

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    CGO '12 Paper Acceptance Rate26of90submissions,29%Overall Acceptance Rate312of1,061submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader