Skip to main content
Log in

Availability analysis of software architecture decomposition alternatives for local recovery

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

We present an efficient and easy-to-use methodology to predict—at design time—the availability of systems that support local recovery. Our analysis techniques work at the architectural level, where the software designer simply inputs the software modules’ decomposition annotated with failure and repair rates. From this decomposition, we automatically generate an analytical model (a continuous-time Markov chain), from which an availability measure is then computed, in a completely automated way. A crucial step is the use of intermediate models in the input/output interactive Markov chain formalism, which makes our techniques efficient, mathematically rigorous, and easy to adapt. In particular, we use aggressive minimization techniques to keep the size of the generated state spaces small. We have applied our methodology on a realistic case study, namely the MPlayer open-source software. We have investigated four different decomposition alternatives and compared our analytical results with the measured availability on a running MPlayer. We found that our predicted results closely match the measured ones .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. An important component used within a software architecture that supports local recovery.

  2. Proactively restarting a software component to mitigate its aging and thus its failure.

  3. Interaction between the RUs is redirected through Inter-Process Communication.

  4. Modeled as part of the RM as mentioned in Sect. 4.

  5. As described later, for each RU, two models are in fact generated.

  6. The recovery time includes the time for restarting failed modules and also the time for error detection, error notification and diagnosis.

  7. An exponential distribution might not be, in some cases, a realistic choice; however, it is also possible to use a phase-type distribution which approximates any distribution arbitrarily closely.

  8. Note that these models are used for availability estimation only and they do not necessarily reflect software implementation. In an actual implementation, a module might not be aware of its failure to notify it. External error detection mechanisms can be employed (Sozer et al. 2009) for this purpose.

References

  • Alvarez, G., & Cristian, F. (1997). Centralized failure injection for distributed, fault-tolerant protocol testing. In Proceedings of the 17th international conference on distributed computing systems, pp. 78–85.

  • Avizienis, A., Laprie, J. C., Randell, B., & Landwehr, C. (2004). Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1(1), 11–33.

    Article  Google Scholar 

  • Bernardi, S., Merseguer, J., & Petriu, D. (2011). A dependability profile within MARTE. Software and Systems Modeling, 10(3), 313–336.

    Article  Google Scholar 

  • Bernardi, S., Merseguer, J., Petriu, D., & Dorina, C. (2012). Dependability modeling and analysis of software systems specified with UML. ACM Computing Surveys, 45(1), 1–48.

    Article  MATH  Google Scholar 

  • Boudali, H., Crouzen, P., & Stoelinga, M. (2007a). A compositional semantics for dynamic fault trees in terms of Interactive Markov Chains. In Proceedings of the 5th international symposium on automated technology for verification and analysis, lecture notes on computer science (LNCS), pp. 441–456.

  • Boudali, H., Crouzen, P., & Stoelinga, M. (2007b). Dynamic fault tree analysis using input/output interactive Markov chains. In Proceedings of the 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN), pp. 708–717.

  • Boudali, H., Crouzen, P., Haverkort, B. R., Kuntz, M., & Stoelinga, M. (2008). Architectural dependability evaluation with arcade. In Proceedings of the 38th IEEE/IFIP international conference on dependable systems and networks (DSN), IEEE, pp. 512–521.

  • Bowles, J., Dobbins, J., & Gregory, J. (2004). Approximate reliability and availability models for high availability and fault-tolerant systems with repair. Quality and Reliability Engineering International, 20(7), 679–697.

    Article  Google Scholar 

  • Bozzano, M., Cimatti, A., Katoen, J. P., Nguyen, V. Y., Noll, T., & Roveri, M. (2011). Safety, dependability and performance analysis of extended AADL models. The Computer Journal, 54(5), 754–775.

    Article  Google Scholar 

  • Brosch, F., Koziolek, H., Buhnova, B., & Reussner, R. (2012). Architecture-based reliability prediction with the palladio component model. IEEE Transactions on Software Engineering, 38(6), 1319–1339.

    Article  Google Scholar 

  • Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., & Fox, A. (2004b). Microreboot: A technique for cheap recovery. In Proceedings of the 6th symposium on operating systems design and implementation (OSDI), San Francisco, CA, pp. 31–44.

  • Candea, G., Cutler, J., & Fox, A. (2004a). Improving availability with recursive micro-reboots: A soft-state system case study. Performance Evaluation, 56(1–4), 213–248.

    Article  Google Scholar 

  • Clements, P., Bachman, F., Bass, L., Garlan, D., Ivers, J., Little, R., et al. (2010). Documenting software architectures: Views and beyond (2nd ed.). Reading, MA: Addison-Wesley.

    Google Scholar 

  • Das, O., & Woodside, C. (1998). The fault-tolerant layered queueing network model for performability of distributed systems. In Proceedings of the international performance and dependability symposium (IPDS) (pp. 132–141). Durham, NC.

  • Dashofy, E., van der Hoek, A., & Taylor, R. (2002). An infrastructure for the rapid development of XML-based architecture description languages. In International conference on software engineering (ICSE) (pp. 266–276). Orlando, Florida: ACM.

  • Devroye, L. (1986). Non-uniform random variate generation. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Dugan, J., & Lyu, M. (1995). Dependability modeling for fault-tolerant software and systems. In M. R. Lyu (Ed.), Software fault tolerance, chapter 5 (pp. 109–138). London: Wiley.

    Google Scholar 

  • Durares, J., & Henrique, S. (2006). Emulation of software faults: A field data study and a practical approach. IEEE Transactions on Software Engineering, 32(11), 849–867.

    Article  Google Scholar 

  • ECLIPSE (2015) Eclipse foundation. http://www.eclipse.org/

  • Franco, J., Barbosa, R., & Zenha-Rela, M. (2012). Automated reliability prediction from formal architectural descriptions. In Proceedings of joint working IEEE/IFIP conference on software architecture (WICSA) and European conference on software architecture (ECSA), pp. 302–309.

  • Franco, J., Barbosa, R., & Zenha-Rela, M. (2014). Availability evaluation of software architectures through formal methods. In Proceedings of the 9th international conference on the quality of information and communications technology (QUATIC), pp. 282–287.

  • Garavel, H., & Lang, F. (2001). SVL: A scripting language for compositional verification. In Proceedings of the international conference on formal techniques for networked and distributed systems (FORTE), pp. 377–394.

  • Garavel, H., Lang, F., Mateescu, R., & Serwe, W. (2007). CADP 2006: A toolbox for the construction and analysis of distributed processes. In Computer-aided verification (CAV), Springer, Lecture Notes on Computer Science (LNCS), vol. 4590, pp. 158–163.

  • Garlan, D., Monroe, R., & Wile, D. (1997). Acme: An architecture description interchange language. In Proceedings of conference of the centre for advanced studies on collaborative research (CASCON), pp. 169–183.

  • Garland, S., Lynch, N., Tauber, J., & Vaziri, M. (2004). IOA user guide and reference manual. Tech. rep., MIT CSAI Laboratory, Cambridge, MA.

  • Geist, R., & Trivedi, K. (1990). Reliability estimation of fault-tolerant systems: Tools and techniques. IEEE Computer, 23(7), 52–61.

    Article  Google Scholar 

  • Hermanns, H. (2002). Interactive Markov Chains: The quest for quantified quality, lecture notes on computer science (LNCS), vol. 2428.

  • Hermanns, H., & Katoen, J. P. (2000). Automated compositional Markov chain generation for a plain-old telephone system. Science of Computer Programming, 36(1), 97–127.

    Article  MATH  Google Scholar 

  • Hunt, G. C., et al. (2007). Sealing OS processes to improve dependability and safety. ACM SIGOPS Operating Systems Review, 41(3), 341–354.

    Article  Google Scholar 

  • Immonen, A., & Niemel, E. (2008). Survey of reliability and availability prediction methods from the viewpoint of software architecture. Software and Systems Modeling, 7(1), 49–65.

    Article  Google Scholar 

  • Joyce, J. (2007). Architecting dependable systems with the sae architecture analysis and description language (AADL). In R. de Lemos, C. Gacek, & A. Romanovsky (Eds.), Architecting dependable systems, IV, lecture notes in computer science, vol. 4615 (pp. 1–13). Berlin: Springer.

    Google Scholar 

  • Kuntz, M., & Haverkort, B. R. (2008). Formal dependability engineering with MIOA. Technical Report TR-CTIT-08-39.

  • Lai, C. D., et al. (2002). A model for availability analysis of distributed software/hardware systems. Information and Software Technology, 44(6), 343–350.

    Article  Google Scholar 

  • Lynch, N., & Tuttle, M. (1989). An introduction to input/output automata. CWI Quarterly, 2(3), 219–246.

    MathSciNet  MATH  Google Scholar 

  • Maier, M., Emery, D., & Hilliard, R. (2001). Software architecture: Introducing IEEE standard 1471. IEEE Computer, 34(4), 107–109.

    Article  Google Scholar 

  • Majzik, I., & Huszerl, G. (2002). Towards dependability modeling of FT-CORBA architectures. In Proceedings of the 4th European dependable computing conference, lecture notes on computer science (LNCS), pp. 121–139.

  • Monnet, S., & Bertier, M. (2007). Using failure injection mechanisms to experiment and evaluate a grid failure detector. In M. Dayde, J. Palma, A. Coutinho, E. Pacitti, & J. Lopes (Eds.), High performance computing for computational science, vol. 4395 (pp. 610–621). Berlin, Heidelberg: Springer.

  • MPLAYER (2015) MPlayer official website. http://www.mplayerhq.hu/

  • Rugina, A. E., Kanoun, K., & Kaaniche, M. (2007). A system dependability modeling framework using AADL and GSPNs. In R. de Lemos, C. Gacek, & A. Romanovsky (Eds.), Architecting dependable systems, IV, lecture notes in computer science, vol. 4615 (pp. 14–38). Berlin: Springer.

    Google Scholar 

  • Sozer, H. (2009). Architecting fault-tolerant software systems. Ph.D. thesis, University of Twente, Enschede, The Netherlands.

  • Sozer, H., & Tekinerdogan, B. (2008). Introducing recovery style for modeling and analyzing system recovery. In Proceedings of the 7th working IEEE/IFIP conference on software architecture (WICSA), (pp. 167–176). Canada: Vancouver.

  • Sozer, H., Tekinerdogan, B., & Aksit, M. (2009). FLORA: A framework for decomposing software architecture to introduce local recovery. Software: Practice and Experience, 39(10), 869–889.

    Google Scholar 

  • Sozer, H., Tekinerdogan, B., & Aksit, M. (2013). Optimizing decomposition of software architecture for local recovery. Software Quality Journal, 21(2), 203–240.

    Article  Google Scholar 

  • Vaidyanathan, K., & Trivedi, K. S. (2005). A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing, 2(2), 124–137.

    Article  Google Scholar 

Download references

Acknowledgments

We thank Pepijn Crouzen for his help using CADP and Boudewijn Haverkort for his comments on an earlier version of this paper. This work has been carried out as a part of the TRADER project under the responsibilities of the Embedded Systems Institute. This work is partially supported by the Dutch Ministry of Economic Affairs under the BSIK program; by the Netherlands Organization for Scientific Research (NWO) under FOCUS/BRICKS Grant Number 642.000.505 (MOQS); and by the EU under Grants Numbers IST-004527 (ARTIST2) and FP7-ICT-2007-1 (QUASIMODO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hasan Sözer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sözer, H., Stoelinga, M., Boudali, H. et al. Availability analysis of software architecture decomposition alternatives for local recovery. Software Qual J 25, 553–579 (2017). https://doi.org/10.1007/s11219-016-9315-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-016-9315-9

Keywords

Navigation