ABSTRACT
Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework.
- M. D. Beynon, T. Kurc, U. Catalyurek, C. Chang, A. Sussman, and J. Saltz. Distributed processing of very large datasets with DataCutter. Parallel Computing, 27(11):1457--1478, November 2001. Google ScholarDigital Library
- I. Brandic, S. Pllana, and S. Benkner. Specification, planning, and execution of QoS-aware Grid workflows within the Amadeus environment. Concurrency and Computation: Practice and Experience, 20(4):331--345, 2008. Google ScholarDigital Library
- F. Chang and V. Karamcheti. Automatic configuration and run-time adaptation of distributed applications. In High Performance Distributed Computing, pages 11--20, 2000. Google ScholarDigital Library
- D. Chiu, S. Deshpande, G. Agrawal, and R. Li. Cost and accuracy sensitive dynamic workflow composition over Grid environments. 9th IEEE/ACM International Conference on Grid Computing, pages 9--16, Oct. 2008. Google ScholarDigital Library
- S. K. Chow, H. Hakozaki, D. L. Price, N. A. B. MacLean, T. J. Deerinck, J. C. Bouwer, M. E. Martone, S. T. Peltier, and M. H. Ellisman. Automated microscopy system for mosaic acquisition and processing. Journal of Microscopy, 222(2):76--84, May 2006.Google ScholarCross Ref
- I.-H. Chung and J. Hollingsworth. A case study using automatic performance tuning for large-scale scientific programs. 15th IEEE International Symposium on High Performance Distributed Computing, pages 45--56, 2006.Google ScholarCross Ref
- I.-H. Chung and J. K. Hollingsworth. Using information from prior runs to improve automated tuning systems. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 30, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
- V. Cortellessa, F. Marinelli, and P. Potena. Automated selection of software components based on cost/reliability tradeoff. In Software Architecture, Third European Workshop, EWSA 2006, volume 4344 of Lecture Notes in Computer Science. Springer, 2006. Google ScholarDigital Library
- E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, S. Patil, M.-H. Su, K. Vahi, and M. Livny. Pegasus: Mapping scientific workflows onto the Grid. Lecture Notes in Computer Science: Grid Computing, pages 11--20, 2004.Google Scholar
- Y. Gil, V. Ratnakar, E. Deelman, G. Mehta, and J. Kim. Wings for Pegasus: Creating large-scale scientific applications using semantic representations of computational workflows. In Proceedings of the 19th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI), July 2007. Google ScholarDigital Library
- T. Glatard, J. Montagnat, and X. Pennec. Efficient services composition for Grid-enabled data-intensive applications. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC'06), Paris, France, June 19, 2006.Google ScholarCross Ref
- J. Kong, O. Sertel, H. Shimada, K. Boyer, J. Saltz, and M. Gurcan. Computer-aided grading of neuroblastic differentiation: Multi-resolution and multi-classifier approach. IEEE International Conference on Image Processing, ICIP 2007, 5:525--528, Oct. 2007.Google ScholarCross Ref
- V. Kumar, B. Rutt, T. Kurc, U. Catalyurek, T. Pan, S. Chow, S. Lamont, M. Martone, and J. Saltz. Large-scale biomedical image analysis in Grid environments. IEEE Transactions on Information Technology in Biomedicine, 12(2):154--161, March 2008. Google ScholarDigital Library
- V. S. Kumar, S. Narayanan, T. M. Kur¸c, J. Kong, M. N. Gurcan, and J. H. Saltz. Analysis and semantic querying in large biomedical image datasets. IEEE Computer, 41(4):52--59, 2008. Google ScholarDigital Library
- I. Lera, C. Juiz, and R. Puigjaner. Performance-related ontologies and semantic web applications for on-line performance assessment intelligent systems. Sci. Comput. Program., 61(1):27--37, 2006. Google ScholarDigital Library
- B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the Kepler system: Research articles. Concurr. Comput.: Pract. Exper., 18(10):1039--1065, 2006. Google ScholarDigital Library
- B. Norris, J. Ray, R. Armstrong, L. C. Mcinnes, and S. Shende. Computational quality of service for scientific components. In Proceedings of the International Symposium on Component--based Software Engineering (CBSE7), pages 264--271. Springer, 2004.Google ScholarCross Ref
- T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(17):3045--3054, 2004. Google ScholarDigital Library
- D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the Condor experience: Research articles. Concurr. Comput. : Pract. Exper., 17(2-4):323--356, 2005. Google ScholarDigital Library
- R. Wolski, N. Spring, and J. Hayes. The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Journal of Future Generation Computing Systems, 15:757--768, 1999. Google ScholarDigital Library
- J. Zhou, K. Cooper, and I.-L. Yen. A rule-based component customization technique for QoS properties. Eighth IEEE International Symposium on High Assurance Systems Engineering, pages 302--303, March 2004. Google ScholarDigital Library
Index Terms
- An integrated framework for performance-based optimization of scientific workflows
Recommendations
Parameterized specification, configuration and execution of data-intensive scientific workflows
Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set ...
A Survey of Data-Intensive Scientific Workflow Management
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for ...
An Architecture Including Network QoS in Scientific Workflows
GCC '10: Proceedings of the 2010 Ninth International Conference on Grid and Cloud ComputingThe quality of the network services has so far rarely been considered in composing and executing scientific workflows. Currently, scientific applications tune the execution quality of workflows neglecting network resources, and by selecting only optimal ...
Comments