Skip to main content

ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation

  • Conference paper
  • First Online:
Computer Safety, Reliability, and Security (SAFECOMP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13414))

Included in the following conference series:

  • 1058 Accesses

Abstract

Fault-injection (FI) campaigns provide an in-depth resilience analysis of safety-critical systems in the presence of transient hardware faults. However, FI campaigns require many independent injection experiments and, combined, long run times, especially if we aim for a high coverage of the fault space. Besides reducing the number of pilot injections (e.g., with def-use pruning) in the first place, we can also speed up the overall campaign by speeding up individual experiments. From our experiments, we see that the timeout failure class is especially important here: Although timeouts account only for 8% (QSort) of the injections, they require 32% of the campaign run time.

In this paper, we analyze and discuss the nature of timeouts as a failure class, and reason about the general design of dynamic timeout detectors. Based on those insights, we propose ACTOR, a method to identify and abort stuck experiments early by performing autocorrelation on the branch-target history. Applied to seven MiBench benchmarks, we can reduce the number of executed post-injection instructions by up to 30%, which translates into an end-to-end saving of 27%. Thereby, the absolute classification error of experiments as timeouts was always less than 0.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In hard real-time settings, the situation is somewhat different: Here, the respective task’s deadline would actually define a ground truth for timeout errors and, thus, also the upper bound for \(t_\text {inv}\). However, depending on the tightness of the deadline, this might still prolong the simulation time too much.

  2. 2.

    https://doi.org/10.5281/zenodo.6534708.

References

  1. Arlat, J., et al.: Fault injection for dependability validation: a methodology and some applications. IEEE Trans. Softw. Eng. 16(2), 166–182 (1990). https://doi.org/10.1109/32.44380. ISSN: 0098-5589

    Article  Google Scholar 

  2. Bartsch, C., Villarraga, C., Stoffel, D., Kunz, W.: A HW/SW cross-layer approach for determining application-redundant hardware faults in embedded systems. J. Electron. Test. 33(1), 77–92 (2017). https://doi.org/10.1007/s10836-017-5643-3

    Article  Google Scholar 

  3. Berrojo, L., et al.: New techniques for speeding-up fault-injection campaigns. In: Design, Automation and Test in Europe Conference and Exhibition, pp. 847–852. IEEE (2002)

    Google Scholar 

  4. Burnim, J., Jalbert, N., Stergiou, C., Sen, K.: Looper: lightweight detection of infinite loops at runtime. In: Automated Software Engineering (ASE 2009), pp. 161–169. IEEE Computer Society (2009). https://doi.org/10.1109/ASE.2009.87

  5. Carbin, M., Misailovic, S., Kling, M., Rinard, M.C.: Detecting and escaping infinite loops with jolt. In: Mezini, M. (ed.) ECOOP 2011. LNCS, vol. 6813, pp. 609–633. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22655-7_28

    Chapter  Google Scholar 

  6. Di Leo, D., Ayatolahi, F., Sangchoolie, B., Karlsson, J., Johansson, R.: On the impact of hardware faults – an investigation of the relationship between workload inputs and failure mode distributions. In: Ortmeier, F., Daniel, P. (eds.) SAFECOMP 2012. LNCS, vol. 7612, pp. 198–209. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33678-2_17 ISBN: 978-3-642-33678-2

    Chapter  Google Scholar 

  7. Dietrich, C., Schmider, A., Pusz, O., Payá-Vayá, G., Lohmann, D.: Cross-layer fault-space pruning for hardware-assisted fault injection. In: 55th Annual Design Automation Conference (DAC 2018). ACM Press (2018). https://doi.org/10.1145/3195970.3196019. ISBN: 978-1-4503-5700-5/18/06

  8. Ebrahimi, M., Sayed, N., Rashvand, M., Tahoori, M.B.: Fault injection acceleration by architectural importance sampling. In: Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp. 212–219. IEEE (2015). https://doi.org/10.1109/CODESISSS.2015.7331384

  9. Gunneflo, U., Karlsson, J., Torin, J.: Evaluation of error detection schemes using fault injection by heavy-ion radiation. In: 19th International Symposium on Fault-Tolerant Computing (FTCS-2019), pp. 340–347. IEEE Computer Society Press, June 1989. https://doi.org/10.1109/FTCS.1989.105590

  10. Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: a free, commercially representative embedded benchmark suite. In: Fourth Annual IEEE International Workshop on Workload Characterization, WWC-4, pp. 3–14, December 2001. https://doi.org/10.1109/WWC.2001.990739

  11. Guthoff, J., Sieh, V.: Combining software-implemented and simulation-based fault injection into a single fault injection method. In: 25nd International Symposium on Fault-Tolerant Computing (FTCS-25), pp. 196–206. IEEE Computer Society Press, June 1995. https://doi.org/10.1109/FTCS.1995.466978

  12. Hari, S.K.S., Adve, S.V., Naeimi, H., Ramachandran, P.: Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults. ACM SIGPLAN Not. 47, 123–134 (2012). https://doi.org/10.1145/2189750.2150990

    Article  Google Scholar 

  13. Hari, S.K.S., Venkatagiri, R., Adve, S.V., Naeimi, H.: GangES: Gang error simulation for hardware resiliency evaluation. In: ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014, Minneapolis, MN, USA, 14–18 June 2014, pp. 61–72. IEEE Computer Society (2014). https://doi.org/10.1109/ISCA.2014.6853212

  14. IEC. IEC 61508 - Functional safety of electrical/electronic/programmable electronic safety-related systems. International Electrotechnical Commission, December 1998

    Google Scholar 

  15. ISO 26262-9: ISO 26262-9:2011: Road vehicles - functional safety - part 9: automotive safety integrity level (ASIL)-oriented and safety-oriented analyses. International Organization for Standardization, Geneva, Switzerland (2011)

    Google Scholar 

  16. Ibing, A., Kirsch, J., Panny, L.: Autocorrelation-based detection of infinite loops at runtime. In: IEEE International Conference on Dependable, Autonomic and Secure Computing, pp. 368–375. IEEE Computer Society (2016). https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2016.78

  17. Kaliorakis, M., Tselonis, S., Chatzidimitriou, A., Foutris, N., Gizopoulos, D.: Differential fault injection on microarchitectural simulators. In: 2015 IEEE International Symposium on Workload Characterization, IISWC 2015, Atlanta, GA, USA, 4–6 October 2015, pp. 172–182. IEEE Computer Society (2015). https://doi.org/10.1109/IISWC.2015.28

  18. King, S.T., Dunlap, G.W., Chen, P.M.: Debugging operating systems with time-traveling virtual machines (awarded general track best paper award!). In: 2005 USENIX Annual Technical Conference, pp. 1–15 (2005). http://www.usenix.org/events/usenix05/tech/general/king.html

  19. Lawton, K.P.: Bochs: a portable PC emulator for Unix/X. Linux J. (29), 7 (1996)

    Google Scholar 

  20. Li, J., Tan, Q.: SmartInjector: exploiting intelligent fault injection for SDC rate analysis. In: Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT 2013), pp. 236–242. IEEE Computer Society Press, October 2013. https://doi.org/10.1109/DFT.2013.6653612

  21. Li, X., Huang, M.C., Shen, K., Chu, L.: A realistic evaluation of memory hardware errors and software system susceptibility. In: 2010 USENIX Annual Technical Conference (2010). https://www.usenix.org/conference/usenix-atc-10/realistic-evaluation-memory-hardware-errors-and-software-system

  22. Luk, C.-K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 40(6), 190–200 (2005)

    Article  Google Scholar 

  23. Mansour, W., Velazco, R.: SEU fault-injection in VHDL-based processors: a case study. In: 13th Latin American Test Workshop (LATW 2012), pp. 1–5. IEEE Computer Society (2012). https://doi.org/10.1109/LATW.2012.6261258

  24. Schirmeier, H., Breddemann, M.: Quantitative cross-layer evaluation of transient-fault injection techniques for algorithm comparison. In: 15th European Dependable Computing Conference, EDCC, pp. 15–22 (2019). https://doi.org/10.1109/EDCC.2019.00016

  25. Schirmeier, H., Hoffmann, M., Dietrich, C., Lenz, M., Lohmann, D., Spinczyk, O.: FAIL*: an open and versatile fault-injection framework for the assessment of software-implemented hardware fault tolerance. In: Sens, P. (ed.) 11th European Dependable Computing Conference (EDCC 2015), pp. 245–255, September 2015. https://doi.org/10.1109/EDCC.2015.28

  26. Schirmeier, H., Rademacher, L., Spinczyk, O.: Smart-hopping: highly efficient ISA-level fault injection on real hardware. In: 19th IEEE European Test Symposium (ETS 2014). IEEE Computer Society Press, May 2014

    Google Scholar 

  27. Smith, D.T., Johnson, B.W., Profeta, J.A., Bozzolo, D.G.: A method to determine equivalent fault classes for permanent and transient faults. In: Reliability and Maintainability Symposium, pp. 418–424. IEEE (1995). https://doi.org/10.1109/RAMS.1995.513278

  28. Sridharan, V., Stearley, J., DeBardeleben, N., Blanchard, S., Gurumurthi, S.: Feng shui of supercomputer memory: positional effects in DRAM and SRAM faults. In: High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 22:1–22:11. ACM Press, New York (2013). ISBN: 978-1-4503-2378-9. https://doi.org/10.1145/2503210.2503257

  29. Ziade, H., Ayoubi, R.A., Velazco, R.: A survey on fault injection techniques. Intl. Arab J. Inf. Technol. 1(2), 171–186 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tim-Marek Thomas or Christian Dietrich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thomas, TM., Dietrich, C., Pusz, O., Lohmann, D. (2022). ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation. In: Trapp, M., Saglietti, F., Spisländer, M., Bitsch, F. (eds) Computer Safety, Reliability, and Security. SAFECOMP 2022. Lecture Notes in Computer Science, vol 13414. Springer, Cham. https://doi.org/10.1007/978-3-031-14835-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14835-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14834-7

  • Online ISBN: 978-3-031-14835-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics