Skip to main content

Scientific Workflow Management on Hybrid Clouds with Cloud Bursting and Transparent Data Access

  • Conference paper
  • First Online:
Book cover Computational Science – ICCS 2021 (ICCS 2021)

Abstract

Cloud bursting is an application deployment model wherein additional computing resources are provisioned from public clouds in cases where local resources are not sufficient, e.g. during peak demand periods. We propose and experimentally evaluate a cloud-bursting solution for scientific workflows. Our solution is portable thanks to using Kubernetes for deployment of the workflow management system and computing clusters in multiple clouds. We also introduce transparent data access by employing a virtual distributed file system across the clouds, allowing jobs to use a POSIX file system interface, while hiding data transfer between clouds. To balance load distribution and minimize the communication volume between clouds, we leverage graph partitioning, while ensuring that the algorithm distributes the load equally at each parallel execution stage of a workflow. The solution is experimentally evaluated using the HyperFlow workflow management system integrated with the Onedata data management platform, deployed in our on-premise cloud in Cyfronet AGH and in the Google Cloud.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Afgan, E., Coraor, N., Chilton, J., Baker, D., Taylor, J., Team, G.: Enabling cloud bursting for life sciences within galaxy. Concurrency Comput. Pract. Experience 27(16), 4330–4343 (2015)

    Article  Google Scholar 

  2. Balis, B.: Hyperflow: a model of computation, programming approach and enactment engine for complex distributed workflows. Future Gener. Comput. Syst. 55, 147–162 (2016)

    Article  Google Scholar 

  3. Balis, B., Figiela, K., Jopek, K., Malawski, M., Pawlik, M.: Porting HPC applications to the cloud: a multi-frontal solver case study. J. Comput. Sci. 18, 106–116 (2017)

    Article  Google Scholar 

  4. Belgacem, M.B., Chopard, B.: A hybrid HPC/cloud distributed infrastructure: coupling EC2 cloud resources with HPC clusters to run large tightly coupled multiscale applications. Future Gener. Comput. Syst. 42, 11–21 (2015)

    Article  Google Scholar 

  5. Bicer, T., Chiu, D., Agrawal, G.: A framework for data-intensive computing with cloud bursting. In: 2011 IEEE International Conference on Cluster Computing, pp. 169–177. IEEE (2011)

    Google Scholar 

  6. Chang, Y.S., Fan, C.T., Sheu, R.K., Jhu, S.R., Yuan, S.M.: An agent-based workflow scheduling mechanism with deadline constraint on hybrid cloud environment. Int. J. Commun. Syst. 31(1), e3401 (2018)

    Article  Google Scholar 

  7. Da Silva, R.F., Chen, W., Juve, G., Vahi, K., Deelman, E.: Community resources for enabling research in distributed scientific workflows. In: 2014 IEEE 10th International Conference on e-Science, vol. 1, pp. 177–184. IEEE (2014)

    Google Scholar 

  8. Dutka, Ł., et al.: Onedata - a step forward towards globalization of data access for computing infrastructures. Procedia Comput. Sci. 51, 2843–2847 (2015). International Conference On Computational Science, ICCS 2015

    Article  Google Scholar 

  9. Goonasekera, N., Mahmoud, A., Chilton, J., Afgan, E.: Galaxycloudrunner: enhancing scalable computing for galaxy. BioRxiv (2020)

    Google Scholar 

  10. Guo, T., Sharma, U., Shenoy, P., Wood, T., Sahu, S.: Cost-aware cloud bursting for enterprise applications. ACM Trans. Internet Technol. (TOIT) 13(3), 1–24 (2014)

    Article  Google Scholar 

  11. Hazekamp, N., et al.: Combining static and dynamic storage management for data intensive scientific workflows. IEEE Trans. Parallel Distrib. Syst. 29(2), 338–350 (2017)

    Article  Google Scholar 

  12. Ilyushkin, A., Ghit, B., Epema, D.: Scheduling workloads of workflows with unknown task runtimes. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 606–616. IEEE (2015)

    Google Scholar 

  13. Lin, B., Guo, W., Lin, X.: Online optimization scheduling for scientific workflows with deadline constraint on hybrid clouds. Concurrency Comput. Pract. Experience 28(11), 3079–3095 (2016)

    Article  Google Scholar 

  14. Liu, Y., et al.: PGen: large-scale genomic variations analysis workflow and browser in SoyKB. In: BMC Bioinformatics, BioMed Central, vol. 17, p. 337 (2016)

    Google Scholar 

  15. Liu, Z., et al.: A data placement strategy for scientific workflow in hybrid cloud. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 556–563. IEEE (2018)

    Google Scholar 

  16. Marathe, A., et al.: A comparative study of high-performance computing on the cloud. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 239–250 (2013)

    Google Scholar 

  17. Mell, P., Grance, T.: The NIST definition of cloud computing (2011)

    Google Scholar 

  18. Moulitsas, I., Karypis, G.: Architecture aware partitioning algorithms. In: Bourgeois, A.G., Zheng, S.Q. (eds.) ICA3PP 2008. LNCS, vol. 5022, pp. 42–53. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69501-1_6

    Chapter  Google Scholar 

  19. Netto, M.A., Calheiros, R.N., Rodrigues, E.R., Cunha, R.L., Buyya, R.: HPC cloud for scientific and business applications: taxonomy, vision, and research challenges. ACM Comput. Surv. (CSUR) 51(1), 1–29 (2018)

    Article  Google Scholar 

  20. Orzechowski, M., Balis, B., Pawlik, K., Pawlik, M., Malawski, M.: Transparent deployment of scientific workflows across clouds-kubernetes approach. In: 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion), pp. 9–10. IEEE (2018)

    Google Scholar 

  21. Parashar, M., AbdelBaky, M., Rodero, I., Devarakonda, A.: Cloud paradigms and practices for computational and data-enabled science and engineering. Comput. Sci. Eng. 15(4), 10–18 (2013)

    Article  Google Scholar 

  22. Tanaka, M., Tatebe, O.: Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2012), pp. 65–72. IEEE (2012)

    Google Scholar 

  23. Tchernykh, A., Schwiegelsohn, U., Alexandrov, V., Talbi, E.: Towards understanding uncertainty in cloud computing resource provisioning. Procedia Comput. Sci. 51, 1772–1781 (2015)

    Article  Google Scholar 

  24. Wu, H., et al.: Automatic cloud bursting under fermicloud. In: 2013 International Conference on Parallel and Distributed Systems, pp. 681–686. IEEE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bartosz Baliś .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baliś, B., Orzechowski, M., Dutka, Ł., Słota, R.G., Kitowski, J. (2021). Scientific Workflow Management on Hybrid Clouds with Cloud Bursting and Transparent Data Access. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77961-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77960-3

  • Online ISBN: 978-3-030-77961-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics