Skip to main content
Log in

MOMTH: multi-objective scheduling algorithm of many tasks in Hadoop

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

A real challenge sits in front of the business solutions these days, in the context of the big amount of data generated by complex software applications: efficiently using the given limited resources to accomplish specific operations and tasks. Depending on the type of application dealing with, when trying to deliver a certain service in a specific time and with a limited budget, a sequential application may be redesigned in a convenient way so that it will become scalable and able to run on multiple resources. Many task computing model brings together loosely coupled applications, composed of many dependent/independent tasks, which will work together for a common result. When asking for a certain service, the most frequently constraints addressed by the user are deadline and budget. This paper elaborates on a multi-objective scheduling algorithm of many tasks in Hadoop for big data processing, named MOMTH. We consider objective functions related to users and resources in the same time with constraints like deadline (scheduling in due time) and budget. The algorithm evaluation was realized in scheduling load simulator, a tool integrated in Hadoop. MobiWay, a collaboration platform that expose interoperability between a large number of sensing mobile devices and a wide-range of mobility applications, was chosen for performance analysis of MOMTH. We compared the proposed algorithm with first in first out and fair schedulers and we obtained similar performance for our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Thanks to The scheduling zoo: A searchable bibliography on scheduling by Peter Brucker and Sigrid Knust, http://www-desir.lip6.fr/~durrc/query/.

References

  1. Abrishami, S., Naghibzadeh, M., Dick, H.J.: Epema. Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1):158–169 (2013). Including Special section: AIRCC-NetCoM 2009 and Special section: Clouds and Service-Oriented Architectures

  2. Baptiste, P., Brucker, P., Knust, S., Timkovsky, V.: Ten notes on equal-execution-time scheduling. 4OR, 2:111–127 (2004)

  3. Baptiste, P.: Scheduling equal-length jobs on identical parallel machines. Discret. Appl. Math. 103(1), 21–32 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  4. Baptiste, P.: A note on scheduling multiprocessor tasks with identical processing times. Comput. Oper. Res. 30(13), 2071–2078 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bart, I.L.: Urban sprawl and climate change: a statistical exploration of cause and effect, with policy options for the EU. Land Use Policy 27(2), 283–292 (2010). Forest transitions Wind power planning, landscapes and publics

    Article  Google Scholar 

  6. Bessis, N., Sotiriadis, S., Pop, F., Cristea, V.: Optimizing the energy efficiency of message exchanging for service distribution in interoperable infrastructures. In: 2012 4th International Conference on Intelligent Networking and Collaborative Systems (INCoS), pp. 105–112 Sept 2012

  7. Bessis, N., Sotiriadis, S., Pop, F., Cristea, V.: Using a novel message-exchanging optimization (meo) model to reduce energy consumption in distributed systems. Simul. Model. Pract. Theory 39(0), 104–120 (2013). S.I.Energy efficiency in Grids and Clouds

    Article  Google Scholar 

  8. Błażewicz, J., Liu, Z.: Scheduling multiprocessor tasks with chain constraints. Eur. J. Oper. Res. 94(2), 231–241 (1996)

    Article  MATH  Google Scholar 

  9. Bourdena, A., Mavromoustakis, C.X., Kormentzas, G., Pallis, E., Mastorakis, G.: A resource intensive traffic-aware scheme using energy-aware routing in cognitive radio networks. Future Gener. Comput. Syst. 39(0), 16–28 (2014). Special Issue on Ubiquitous Computing and Future Communication Systems

    Article  Google Scholar 

  10. Du, J., Leung, J.Y.-T.: Complexity of scheduling parallel task systems. SIAM J. Discrete Math. 2(4), 473–487 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  11. Du, J., Leung, J.Y.-T., Young, G.H.: Scheduling chain-structured tasks to minimize makespan and mean flow time. Inf. Comput. 92(2), 219–236 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  12. Dufour, B., Driesen, K., Hendren, L., Verbrugge, C.: Dynamic metrics for java. SIGPLAN Not. 38(11), 149–168 (2003)

    Article  Google Scholar 

  13. Durillo, J.J., Nae, V., Prodan, R.: Multi-objective energy-efficient workflow scheduling using list-based heuristics. Future Gener. Comput. Syst. 36(0):221–236 (2014). Special Section: Intelligent Big Data Processing Special Section: Behavior Data Security Issues in Network Information Propagation Special Section: Energy-efficiency in Large Distributed Computing Architectures Special Section: eScience Infrastructure and Applications

  14. EU Parliament. Resolution of 10 september 2013 on promoting a european transport-technology strategy for europe’s future sustainable mobility. http://bit.ly/1vJm2Ho. Oct 2014

  15. Facebook. Under the hood: Scheduling mapreduce jobs more efficiently with corona. http://goo.gl/XW9nD7. Oct 2012

  16. Fan, Y., Wei, W., Gao, Y., Wu, W.: Introduction and analysis of simulators of mapreduce. Trustworthy Comput. Serv. pp 345–350. Springer, (2014)

  17. Garey, M.R., Johnson, D.S.: “Strong” NP-completeness results: motivation, examples, and implications. J. Assoc. Comput. Mach. 25(3), 499–508 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  18. Guo, Z., Fox, G.: Improving mapreduce performance in heterogeneous network environments and resource utilization. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Ccgrid 2012), CCGRID ’12, pp. 714–716, Washington 2012. IEEE Computer Society

  19. Guo, L., Zhao, S., Shen, S., Jiang, C.: Task scheduling optimization in cloud computing based on heuristic algorithm. J. Netw. 7(3), 547–553 (2012)

    Google Scholar 

  20. Ibrahim, S., Phan, T.-D., Carpen-Amarie, A., Chihoub, H.-E., Moise, D., Antoniu, G.: Governing energy consumption in hadoop through CPU frequency scaling: an analysis. Future Gener. Comput. Syst. http://www.sciencedirect.com/science/article/pii/S0167739X15000060 (2015)

  21. Kc, K., Anyanwu, K.: Scheduling hadoop jobs to meet deadlines. In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, CLOUDCOM ’10, pp. 388–392, Washington, DC, USA, 2010. IEEE Computer Society

  22. Lawler, E.L., Lenstra, J.K., Kan, A.H.G.R., Shmoys, D.B.: Sequencing and Scheduling: Algorithms and Complexity, Volume 4 of Operations Research and Managment Science. CWI, Amsterdam (1989)

    Google Scholar 

  23. Mavromoustakis, C.X., Dimitriou, C., Mastorakis, G., Bourdena, A., Pallis, E.: Using traffic diversities for scheduling wireless interfaces for energy harvesting in wireless devices. In Resource Management in Mobile Computing Environments, volume 3 of Modeling and Optimization in Science and Technologies, pp 481–496. Springer International Publishing (2014)

  24. Mavromoustakis, C.X., Pallis, E., Mastorakis, G.: Resource Management in Mobile Computing Environments. Springer, Berlin (2014)

    Google Scholar 

  25. Nguyen, P., Simon, T., Halem, M., Chapman, D., Le, Q.: A hybrid scheduling algorithm for data intensive workloads in a mapreduce environment. In: Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing, UCC ’12, pp. 161–167, Washington, DC, USA, 2012. IEEE Computer Society

  26. Nita, M.-C., Chilipirea, C., Dobre, C., Pop, F.: A sla-based method for big-data transfers with multi-criteria optimization constraints for iaas. In: Roedunet International Conference (RoEduNet), 2013 11th, pp 1–6 (2013)

  27. Nita, M.-C., Pop, F., Cristea, V.: Scheduling service with sla assurance for private cloud systems. In: 2012 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 331–334, Aug 2012

  28. Pandey, S., Buyya, R.: Scheduling workflow applications based on multi-source parallel data retrieval in distributed computing networks. Comput. J. 55(11), 1288–1308 (2012)

    Article  Google Scholar 

  29. Raicu, I., Foster, I.T., Zhao, Y.: Many-task computing for grids and supercomputers. In: Workshop on Many-Task Computing on Grids and Supercomputers, 2008. MTAGS 2008. pp. 1–11 (2008)

  30. Rong, G., Yang, X., Yan, J., Sun, Y., Wang, B., Yuan, C., Huang, Y.: Shadoop: improving mapreduce performance by optimizing job execution mechanism in hadoop clusters. J. Parallel Distrib. Comput. 74(3), 2166–2179 (2014)

  31. Simon, T.A., Nguyen, P., Halem, M.: Multiple objective scheduling of hpc workloads through dynamic prioritization. In: Proceedings of the High Performance Computing Symposium, HPC ’13, pp. 13:1–13:8, San Diego, CA, USA, 2013. Society for Computer Simulation International

  32. Staples, G.: Torque resource manager. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC ’06, New York, NY, USA, 2006. ACM

  33. Vasile, M.-A., Pop, F., Tutueanu, R.-I., Cristea, V., Kołodziej, J.: Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener. Comput. Syst. http://www.sciencedirect.com/science/article/pii/S0167739X14002532 (2014)

  34. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC ’13, pp. 5:1–5:16, New York, NY, USA, 2013. ACM

  35. Voicu, C., Pop, F., Dobre, C., Xhafa, F.: Momc: Multi-objective and multi-constrained scheduling algorithm of many tasks in hadoo. In 3PGCIC-2014, The 9-th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing. IEEE Explore Nov 2014

  36. Wang, L., Khan, S.U., Chen, D., Kołodziej, J., Ranjan, R., Xu, C.Z., Zomaya, A.: Energy-aware parallel task scheduling in a cluster. Future Gener. Comput. Syst. 29(7):1661–1670, 2013. Including Special sections: cyber-enabled Distributed Computing for Ubiquitous Cloud and Network Services, Cloud Computing and Scientific Applications—Big Data, Scalable Analytics, and Beyond

  37. Wang, L., von Laszewski, G., Younge, A., He, X., Kunze, M., Tao, J., Cheng, F.: Cloud computing: a perspective study. New Gener. Comput. 28(2), 137–146 (2010)

    Article  MATH  Google Scholar 

  38. Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.: G-hadoop: mapreduce across distributed data centers for data-intensive computing. Future Gener. Comput. Syst. 29(3), 739–750 (2013). Special Section: Recent Developments in High Performance Computing and Security

    Article  Google Scholar 

  39. Xia, Y., Wang, L., Zhao, Q., Zhang, G.: Research on job scheduling algorithm in hadoop. J. Comput. Inf. Syst. 7(16), 5769–5775 (2011)

    Google Scholar 

  40. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pp. 29–42, Berkeley, CA, USA, 2008. USENIX Association

  41. Zhang, F., Cao, J., Li, K., Khan, S.U., Hwang, K.: Multi-objective scheduling of many tasks in cloud platforms. Future Gener. Comput. Syst. 37(0):309–320 (2014). Special Section: Innovative Methods and Algorithms for Advanced Data-Intensive Computing Special Section: Semantics, Intelligent processing and services for big data Special Section: Advances in Data-Intensive Modelling and Simulation Special Section: Hybrid Intelligence for Growing Internet and its Applications

  42. Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., Kolodziej, J., Streit, A., Georgakopoulos, D.: A security framework in g-hadoop for big data computing across distributed cloud data centres. J. Comput. Syst. Sci. 80(5):994–1007 (2014). cited By (since 1996)0

Download references

Acknowledgments

The research presented in this paper is supported by Projects: CyberWater Grant of the Romanian National Authority for Scientific Research, CNDI-UEFISCDI, project number 47/2012; MobiWay: Mobility Beyond Individualism: an Integrated Platform for Intelligent Transportation Systems of Tomorrow—PN-II-PT-PCCA-2013-4-0321; clueFarm: Information system based on cloud services accessible through mobile devices, to increase product quality and business development farms—PN-II-PT-PCCA-2013-4-0870. This work was also partially supported by COMMAS Project “Computational Models and Methods for Massive Structured Data” (TIN2013-46181-C2-1-R). We would like to thank the reviewers for their time and expertise, constructive comments and valuable insight.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florin Pop.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nita, MC., Pop, F., Voicu, C. et al. MOMTH: multi-objective scheduling algorithm of many tasks in Hadoop. Cluster Comput 18, 1011–1024 (2015). https://doi.org/10.1007/s10586-015-0454-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-015-0454-8

Keywords

Mathematics Subject Classification

Navigation