skip to main content
10.1145/1551609.1551638acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

An integrated framework for performance-based optimization of scientific workflows

Authors Info & Claims
Published:11 June 2009Publication History

ABSTRACT

Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework.

References

  1. M. D. Beynon, T. Kurc, U. Catalyurek, C. Chang, A. Sussman, and J. Saltz. Distributed processing of very large datasets with DataCutter. Parallel Computing, 27(11):1457--1478, November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Brandic, S. Pllana, and S. Benkner. Specification, planning, and execution of QoS-aware Grid workflows within the Amadeus environment. Concurrency and Computation: Practice and Experience, 20(4):331--345, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Chang and V. Karamcheti. Automatic configuration and run-time adaptation of distributed applications. In High Performance Distributed Computing, pages 11--20, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Chiu, S. Deshpande, G. Agrawal, and R. Li. Cost and accuracy sensitive dynamic workflow composition over Grid environments. 9th IEEE/ACM International Conference on Grid Computing, pages 9--16, Oct. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. K. Chow, H. Hakozaki, D. L. Price, N. A. B. MacLean, T. J. Deerinck, J. C. Bouwer, M. E. Martone, S. T. Peltier, and M. H. Ellisman. Automated microscopy system for mosaic acquisition and processing. Journal of Microscopy, 222(2):76--84, May 2006.Google ScholarGoogle ScholarCross RefCross Ref
  6. I.-H. Chung and J. Hollingsworth. A case study using automatic performance tuning for large-scale scientific programs. 15th IEEE International Symposium on High Performance Distributed Computing, pages 45--56, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  7. I.-H. Chung and J. K. Hollingsworth. Using information from prior runs to improve automated tuning systems. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 30, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Cortellessa, F. Marinelli, and P. Potena. Automated selection of software components based on cost/reliability tradeoff. In Software Architecture, Third European Workshop, EWSA 2006, volume 4344 of Lecture Notes in Computer Science. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, S. Patil, M.-H. Su, K. Vahi, and M. Livny. Pegasus: Mapping scientific workflows onto the Grid. Lecture Notes in Computer Science: Grid Computing, pages 11--20, 2004.Google ScholarGoogle Scholar
  10. Y. Gil, V. Ratnakar, E. Deelman, G. Mehta, and J. Kim. Wings for Pegasus: Creating large-scale scientific applications using semantic representations of computational workflows. In Proceedings of the 19th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI), July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Glatard, J. Montagnat, and X. Pennec. Efficient services composition for Grid-enabled data-intensive applications. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC'06), Paris, France, June 19, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Kong, O. Sertel, H. Shimada, K. Boyer, J. Saltz, and M. Gurcan. Computer-aided grading of neuroblastic differentiation: Multi-resolution and multi-classifier approach. IEEE International Conference on Image Processing, ICIP 2007, 5:525--528, Oct. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  13. V. Kumar, B. Rutt, T. Kurc, U. Catalyurek, T. Pan, S. Chow, S. Lamont, M. Martone, and J. Saltz. Large-scale biomedical image analysis in Grid environments. IEEE Transactions on Information Technology in Biomedicine, 12(2):154--161, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. S. Kumar, S. Narayanan, T. M. Kur¸c, J. Kong, M. N. Gurcan, and J. H. Saltz. Analysis and semantic querying in large biomedical image datasets. IEEE Computer, 41(4):52--59, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Lera, C. Juiz, and R. Puigjaner. Performance-related ontologies and semantic web applications for on-line performance assessment intelligent systems. Sci. Comput. Program., 61(1):27--37, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the Kepler system: Research articles. Concurr. Comput.: Pract. Exper., 18(10):1039--1065, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Norris, J. Ray, R. Armstrong, L. C. Mcinnes, and S. Shende. Computational quality of service for scientific components. In Proceedings of the International Symposium on Component--based Software Engineering (CBSE7), pages 264--271. Springer, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(17):3045--3054, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the Condor experience: Research articles. Concurr. Comput. : Pract. Exper., 17(2-4):323--356, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Wolski, N. Spring, and J. Hayes. The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Journal of Future Generation Computing Systems, 15:757--768, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Zhou, K. Cooper, and I.-L. Yen. A rule-based component customization technique for QoS properties. Eighth IEEE International Symposium on High Assurance Systems Engineering, pages 302--303, March 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An integrated framework for performance-based optimization of scientific workflows

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                HPDC '09: Proceedings of the 18th ACM international symposium on High performance distributed computing
                June 2009
                237 pages
                ISBN:9781605585871
                DOI:10.1145/1551609

                Copyright © 2009 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 11 June 2009

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                Overall Acceptance Rate166of966submissions,17%

                Upcoming Conference

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader