ABSTRACT
Weaver is a high-level framework that enables researchers to integrate distributed computing abstractions into their scientific workflows. Rather than develop a new workflow language, we built Weaver on top of the Python programming language. As such, Weaver takes advantage of users' familiarity with Python, minimizes barriers to adoption, and allows for integration with existing software. In this paper, we introduce Weaver's programming model, which consists of datasets, functions, and abstractions that users combine to organize and specify large-scale scientific workflows. We also explain how these specifications are compiled into a directed acyclic graph used by a workflow manager that dispatches the work to a variety of distributed computing engines. To examine how Weaver is used in scientific research, we present three example applications that demonstrate Weaver's ability to integrate into existing workflows and incorporate optimized distributed computing abstraction tools.
- }}The directed acyclic graph manager. http://www.cs.wisc.edu/condor/dagman, 2002.Google Scholar
- }}J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Operating Systems Design and Implementation, 2004. Google ScholarDigital Library
- }}E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, B. Berriman, J. Good, A. Laity, J. Jacob, and D. Katz. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming Journal, 13(3), 2005. Google ScholarDigital Library
- }}M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data parallel programs from sequential building blocks. In Proceedings of EuroSys, March 2007. Google ScholarDigital Library
- }}M. Isard and Y. Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 987--994, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- }}C. Moretti, J. Bulosan, D. Thain, and P. Flynn. All-Pairs: An Abstraction for Data Intensive Cloud Computing. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1--11, 2008.Google Scholar
- }}C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- }}R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal, 13(4):227--298. Google ScholarDigital Library
- }}Python Programming Language. http://www.python.org/, 2010.Google Scholar
- }}SQLAlchemy. http://sqlalchemy.org/, 2010.Google Scholar
- }}D. Thain, T. Tannenbaum, and M. Livny. Condor and the grid. In F. Berman, G. Fox, and T. Hey, editors, Grid Computing: Making the Global Infrastructure a Reality. John Wiley, 2003. Google ScholarDigital Library
- }}L. Yu, C. Moretti, A. Thrasher, S. Emrich, K. Judd, and D. Thain. Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions. to appear in Journal of Cluster Computing, 2010. Google ScholarDigital Library
- }}Y. Zhao, J. Dobson, L. Moreau, I. Foster, and M. Wilde. A notation and system for expressing and executing cleanly typed workflows on messy scientific data. In SIGMOD, 2005. Google ScholarDigital Library
Index Terms
- Weaver: integrating distributed computing abstractions into scientific workflows using Python
Recommendations
Scripting distributed scientific workflows using Weaver
Weaver is a high-level distributed computing framework that enables researchers to construct scalable scientific data-processing workflows. Instead of developing a new workflow language, we introduce a domain-specific language built on top of Python ...
Aspect weaver: a model transformation approach for UML models
CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative ResearchAspect-Oriented Modeling (AOM) is an emerging solution for handling crosscutting concerns at the software modeling level in order to reduce the complexity of software models and application code. In this paper, we present the implementation strategies ...
A practical monadic aspect weaver
FOAL '12: Proceedings of the eleventh workshop on Foundations of Aspect-Oriented LanguagesWe present Monascheme, an extensible aspect-oriented programming language based on monadic aspect weaving. Extensions to the aspect language are defined as monads, enabling easy, simple and modular prototyping. The language is implemented as an embedded ...
Comments