Skip to main content

Advertisement

Log in

An Iterative Approach to Trustable Systems Management Automation and Fault Handling

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Automated systems management solutions aim to reduce the pressure on the administrators of complex, large-scale, distributed systems by enabling the automation of many of the common tasks of management. However, this creates a level of abstraction, which can act as a barrier between the administrator and the elements being controlled. This can impede the transition to new management paradigms required by the increase of off-premise resources and hybrid cloud systems. The resulting loss of control of the managed environment can contribute to a loss of trust in automated systems management solutions and affect their broader use. This paper proposes a novel approach where the administrator can control the automation level on a per task basis. Administrators define a management task as they would perform it directly and allow the solution to identify the triggers that cause the task to be enacted. The solution also allows administrators to define relevant task output that can be analyzed for fault states and enable error recovery without manual intervention. The impact of this approach leads to reduced management effort for the administrator, while retaining controllability and keeping automation costs low, along with reducing the incidence of errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Armburst, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: a berkeley view of cloud computing. Commun. ACM 53(4), 50–58 (2010)

    Article  Google Scholar 

  2. Zhang, Q., Cheng, L., Boutaba, R.: Cloud computing: state-of-the-art and research challenges. J. Internet Serv. Appl. 1(1), 7–18 (2010). doi:10.1007/s13174-010-0007-6

    Article  Google Scholar 

  3. Anderson, E.A.: Researching system administration. Ph.D. thesis, University of California at Berkeley (2002). URL http://www.eecs.berkeley.edu/Pubs/Dissertations/Data/8465.pdf. Accessed 9 Aug 2012

  4. Cfengine: Cfengine: Companies. URL http://www.cfengine.com/use_cases. Accessed 15th Apr 2013

  5. Brown, A.B., Hellerstein, J.L.: Reducing the cost of it operations: is automation always the answer? In: HotOS ’05: Proceedings of the 10th Conference on Hot Topics in Operating Systems (2005)

  6. Duez, P.P., Zuliana, M.J., Jamieson, G.A.: Trust by design: information requirements for appropriate trust in automation. In: CASCON ’06: Proceedings of the 2006 Conference of the Center for Advanced Studies on Collaborative Research (2006)

  7. Velasquez, N.F., Weisband, S.P.: Work practices of system administrators: Implications for tool design. In: CHIMIT ’08: Proceedings of the 2nd ACM Symposium on Computer Human Interaction for Management of Information Technology (2008)

  8. Nolan, R., McFarlan, F.W.: Information technology and the board of directors. Harv. Bus. Rev. 83(10), 96–106 (2005)

    Google Scholar 

  9. Sheridan, T.B., Parasuraman, R.: Human-automation interaction. Rev. Hum. Factors Ergon. 1(1), 80–129 (2006)

    Google Scholar 

  10. IBM: Ibm global services and autonomic computing. White paper, IBM (2002)

  11. Anderson, P., Scobie, A.: LCFG: The next generation. Division of Informatics, University of Edinburgh (2002). URL http://www.inf.ed.ac.uk/publications/online/1145.pdf. Accessed 9 Aug 2012

  12. Garcia Leiva, R., Barroso Lopez, M., Cancio Melia, G., Chardi Marco, B., Cons, L., Poznanski, P., Washbrook, A., Ferro, E., Holt, A.: Quattor: tools and techniques for the configuration, installation and management of large-scale grid computing fabrics. J. Grid Comput. 2(4), 313–322 (2004)

    Article  Google Scholar 

  13. Burgess, M.: A tiny overview of cfengine: convergent maintenance agent. In: MARS/ICINCO ’05: Proceedings of the 1st International Workshop on Multi-Agent and Robotic Systems (2005)

  14. Labs, P.: Puppet documentation. URL http://docs.puppetlabs.com/. Accessed 9 Aug 2012

  15. IBM: An architectural blueprint for autonomic computing. White paper, IBM (2005)

  16. Huebsher, M.C., McCann, J.A.: A survey of autonomic computing—degrees, models, and applications. ACM Comput. Surv. 40(3), 1–28 (2008)

    Article  Google Scholar 

  17. Lanfranchi, G., Della Peruta, P., Perrone, A., Calvanese, D.: Toward a new landscape of systems management in an autonomic computing environment. IBM Syst. J. 42(1), 119–128 (2003)

    Article  Google Scholar 

  18. Herrmann, K., Muhl, G., Geihs, K.: Self management: the solution to complexity or just another problem?. IEEE Distrib. Syst. Online 6(1), 1 (2005)

    Article  Google Scholar 

  19. Barrett, R., Chen, Y.Y.M., Maglio, P.P.: System administrators are users, too: designing workspaces for managing internet-scale systems. In: CHI ’03: Proceedings of the 2003 Conference on Human Factors in Computing Systems, pp. 1068–1069 (2003)

  20. Buchholz, J., Volk, E.: The need for new monitoring and management technologies in large scale computing systems. In: Proceedings of the eChallenges e-2010 Conference (2010)

  21. Bainbridge, L.: Ironies of automation. Automatica 19(6), 775–779 (1983)

    Article  Google Scholar 

  22. David, J.S., Schuff, D., St. Louis, R.: Managing your total it cost of ownership. Commun. ACM 45(1), 101–106 (2002)

    Article  Google Scholar 

  23. Di Nocera, F., Lorenz, B., Parasuraman, R.: Consequences of shifting from one level of automation to another: main effects and their stability. In: Human Factors in Design, Safety and Management, pp. 363–376 (2004)

  24. Chen, X., Mao, Y., Mao, Z.M., Merwe, J.V.d.: Declarative configuration management for complex and dynamic networks. In: Proceedings of ACM CoNext (2010)

  25. Volk, E., Buchholz, J., Wesner, S., Koudela, D., Schmidt, M., Fallenbeck, N., Schwarzkopf, R., Freisleben, B., Isenmann, G., Schwitalla, J.: Towards intelligent management of very large computing systems. In: Proceedings of the International Conference on Competence in High Performance Computing (2010)

  26. Schumm, D., Fehling, C., Karastoyanova, D., Leymann, F., Rütschlin, J.: Processes for human integration in automated cloud application management. Tech. rep., Universität Stuttgart (2012)

  27. Humble, J., Molesky, J.: Why enterprises must adopt devops to enable continuous delivery. Cut. IT J. 24(8), 6–12 (2011)

    Google Scholar 

  28. Ekaette, E., Far, B.: A framework for distributed fault management using intelligent software agents. In: Proceedings of the IEEE Canadian Conference on Electrical and Computer Engineering (2003)

  29. Hanemann, A., Sailer, M., Schmitz, D.: Assured service quality by improved fault management. In: ICSOC ’04: Proceedings of the 2nd International Conference on Service Oriented Computing, pp. 183–192 (2004)

  30. Oliveira, F., Tjang, A., Bianchini, R., Martin, R.P., Nguyen, T.D.: Barricade: defending systems against operator mistakes. In: Proceedings of the 5th European Conference on Computer Systems (2010)

  31. Lee, J.D., See, K.A.: Trust in automation: designing for appropriate reliance. Hum. Factors J. Hum. Factors Ergon. Soc. 46(1), 50–80 (2004)

    Article  Google Scholar 

  32. McLarnon, B., Robinson, P., Milligan, P., Sage, P.: Introducing automated management through iteratively increased automation and indicators. In: DANMS ’11: Proceedings of 4th IFIP/IEEE Workshop on Distributed Autonomous Network Management Systems, pp. 1116–1121 (2011)

  33. Dugmore, J., Taylor, S.: Itil v3 and iso/iec 20000. Tech. rep., BSi (2008)

  34. Delaet, T., Joosen, W., Vanbrabant, B.: A survey of system configuration tools. In: LISA ’10: Proceedings of the 24th International Conference on Large Installation System Administration (2010)

  35. Diao, Y., Hellerstein, J.L., Parekh, S., Griffith, R., Kaiser, G.E., Phung, D.: A control theory foundation for self-managing computer systems. IEEE J. Sel. Areas Commun. 23(12), 2213–2222 (2005)

    Article  Google Scholar 

Download references

Acknowledgments

This work was carried out with the support of the GEYSERS (FP7-ICT-248657) project funded by the European Commission through the 7th ICT Framework Program. Neither this paper nor any part of its content has been published or accepted for publication elsewhere, nor has it been submitted to any other journal for review.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barry McLarnon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McLarnon, B., Robinson, P., Milligan, P. et al. An Iterative Approach to Trustable Systems Management Automation and Fault Handling. J Netw Syst Manage 22, 366–395 (2014). https://doi.org/10.1007/s10922-013-9295-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10922-013-9295-z

Keywords

Navigation