Abstract
Fault-injection (FI) campaigns provide an in-depth resilience analysis of safety-critical systems in the presence of transient hardware faults. However, FI campaigns require many independent injection experiments and, combined, long run times, especially if we aim for a high coverage of the fault space. Besides reducing the number of pilot injections (e.g., with def-use pruning) in the first place, we can also speed up the overall campaign by speeding up individual experiments. From our experiments, we see that the timeout failure class is especially important here: Although timeouts account only for 8% (QSort) of the injections, they require 32% of the campaign run time.
In this paper, we analyze and discuss the nature of timeouts as a failure class, and reason about the general design of dynamic timeout detectors. Based on those insights, we propose ACTOR, a method to identify and abort stuck experiments early by performing autocorrelation on the branch-target history. Applied to seven MiBench benchmarks, we can reduce the number of executed post-injection instructions by up to 30%, which translates into an end-to-end saving of 27%. Thereby, the absolute classification error of experiments as timeouts was always less than 0.5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In hard real-time settings, the situation is somewhat different: Here, the respective task’s deadline would actually define a ground truth for timeout errors and, thus, also the upper bound for \(t_\text {inv}\). However, depending on the tightness of the deadline, this might still prolong the simulation time too much.
- 2.
References
Arlat, J., et al.: Fault injection for dependability validation: a methodology and some applications. IEEE Trans. Softw. Eng. 16(2), 166–182 (1990). https://doi.org/10.1109/32.44380. ISSN: 0098-5589
Bartsch, C., Villarraga, C., Stoffel, D., Kunz, W.: A HW/SW cross-layer approach for determining application-redundant hardware faults in embedded systems. J. Electron. Test. 33(1), 77–92 (2017). https://doi.org/10.1007/s10836-017-5643-3
Berrojo, L., et al.: New techniques for speeding-up fault-injection campaigns. In: Design, Automation and Test in Europe Conference and Exhibition, pp. 847–852. IEEE (2002)
Burnim, J., Jalbert, N., Stergiou, C., Sen, K.: Looper: lightweight detection of infinite loops at runtime. In: Automated Software Engineering (ASE 2009), pp. 161–169. IEEE Computer Society (2009). https://doi.org/10.1109/ASE.2009.87
Carbin, M., Misailovic, S., Kling, M., Rinard, M.C.: Detecting and escaping infinite loops with jolt. In: Mezini, M. (ed.) ECOOP 2011. LNCS, vol. 6813, pp. 609–633. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22655-7_28
Di Leo, D., Ayatolahi, F., Sangchoolie, B., Karlsson, J., Johansson, R.: On the impact of hardware faults – an investigation of the relationship between workload inputs and failure mode distributions. In: Ortmeier, F., Daniel, P. (eds.) SAFECOMP 2012. LNCS, vol. 7612, pp. 198–209. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33678-2_17 ISBN: 978-3-642-33678-2
Dietrich, C., Schmider, A., Pusz, O., Payá-Vayá, G., Lohmann, D.: Cross-layer fault-space pruning for hardware-assisted fault injection. In: 55th Annual Design Automation Conference (DAC 2018). ACM Press (2018). https://doi.org/10.1145/3195970.3196019. ISBN: 978-1-4503-5700-5/18/06
Ebrahimi, M., Sayed, N., Rashvand, M., Tahoori, M.B.: Fault injection acceleration by architectural importance sampling. In: Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp. 212–219. IEEE (2015). https://doi.org/10.1109/CODESISSS.2015.7331384
Gunneflo, U., Karlsson, J., Torin, J.: Evaluation of error detection schemes using fault injection by heavy-ion radiation. In: 19th International Symposium on Fault-Tolerant Computing (FTCS-2019), pp. 340–347. IEEE Computer Society Press, June 1989. https://doi.org/10.1109/FTCS.1989.105590
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: a free, commercially representative embedded benchmark suite. In: Fourth Annual IEEE International Workshop on Workload Characterization, WWC-4, pp. 3–14, December 2001. https://doi.org/10.1109/WWC.2001.990739
Guthoff, J., Sieh, V.: Combining software-implemented and simulation-based fault injection into a single fault injection method. In: 25nd International Symposium on Fault-Tolerant Computing (FTCS-25), pp. 196–206. IEEE Computer Society Press, June 1995. https://doi.org/10.1109/FTCS.1995.466978
Hari, S.K.S., Adve, S.V., Naeimi, H., Ramachandran, P.: Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults. ACM SIGPLAN Not. 47, 123–134 (2012). https://doi.org/10.1145/2189750.2150990
Hari, S.K.S., Venkatagiri, R., Adve, S.V., Naeimi, H.: GangES: Gang error simulation for hardware resiliency evaluation. In: ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014, Minneapolis, MN, USA, 14–18 June 2014, pp. 61–72. IEEE Computer Society (2014). https://doi.org/10.1109/ISCA.2014.6853212
IEC. IEC 61508 - Functional safety of electrical/electronic/programmable electronic safety-related systems. International Electrotechnical Commission, December 1998
ISO 26262-9: ISO 26262-9:2011: Road vehicles - functional safety - part 9: automotive safety integrity level (ASIL)-oriented and safety-oriented analyses. International Organization for Standardization, Geneva, Switzerland (2011)
Ibing, A., Kirsch, J., Panny, L.: Autocorrelation-based detection of infinite loops at runtime. In: IEEE International Conference on Dependable, Autonomic and Secure Computing, pp. 368–375. IEEE Computer Society (2016). https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2016.78
Kaliorakis, M., Tselonis, S., Chatzidimitriou, A., Foutris, N., Gizopoulos, D.: Differential fault injection on microarchitectural simulators. In: 2015 IEEE International Symposium on Workload Characterization, IISWC 2015, Atlanta, GA, USA, 4–6 October 2015, pp. 172–182. IEEE Computer Society (2015). https://doi.org/10.1109/IISWC.2015.28
King, S.T., Dunlap, G.W., Chen, P.M.: Debugging operating systems with time-traveling virtual machines (awarded general track best paper award!). In: 2005 USENIX Annual Technical Conference, pp. 1–15 (2005). http://www.usenix.org/events/usenix05/tech/general/king.html
Lawton, K.P.: Bochs: a portable PC emulator for Unix/X. Linux J. (29), 7 (1996)
Li, J., Tan, Q.: SmartInjector: exploiting intelligent fault injection for SDC rate analysis. In: Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT 2013), pp. 236–242. IEEE Computer Society Press, October 2013. https://doi.org/10.1109/DFT.2013.6653612
Li, X., Huang, M.C., Shen, K., Chu, L.: A realistic evaluation of memory hardware errors and software system susceptibility. In: 2010 USENIX Annual Technical Conference (2010). https://www.usenix.org/conference/usenix-atc-10/realistic-evaluation-memory-hardware-errors-and-software-system
Luk, C.-K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 40(6), 190–200 (2005)
Mansour, W., Velazco, R.: SEU fault-injection in VHDL-based processors: a case study. In: 13th Latin American Test Workshop (LATW 2012), pp. 1–5. IEEE Computer Society (2012). https://doi.org/10.1109/LATW.2012.6261258
Schirmeier, H., Breddemann, M.: Quantitative cross-layer evaluation of transient-fault injection techniques for algorithm comparison. In: 15th European Dependable Computing Conference, EDCC, pp. 15–22 (2019). https://doi.org/10.1109/EDCC.2019.00016
Schirmeier, H., Hoffmann, M., Dietrich, C., Lenz, M., Lohmann, D., Spinczyk, O.: FAIL*: an open and versatile fault-injection framework for the assessment of software-implemented hardware fault tolerance. In: Sens, P. (ed.) 11th European Dependable Computing Conference (EDCC 2015), pp. 245–255, September 2015. https://doi.org/10.1109/EDCC.2015.28
Schirmeier, H., Rademacher, L., Spinczyk, O.: Smart-hopping: highly efficient ISA-level fault injection on real hardware. In: 19th IEEE European Test Symposium (ETS 2014). IEEE Computer Society Press, May 2014
Smith, D.T., Johnson, B.W., Profeta, J.A., Bozzolo, D.G.: A method to determine equivalent fault classes for permanent and transient faults. In: Reliability and Maintainability Symposium, pp. 418–424. IEEE (1995). https://doi.org/10.1109/RAMS.1995.513278
Sridharan, V., Stearley, J., DeBardeleben, N., Blanchard, S., Gurumurthi, S.: Feng shui of supercomputer memory: positional effects in DRAM and SRAM faults. In: High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 22:1–22:11. ACM Press, New York (2013). ISBN: 978-1-4503-2378-9. https://doi.org/10.1145/2503210.2503257
Ziade, H., Ayoubi, R.A., Velazco, R.: A survey on fault injection techniques. Intl. Arab J. Inf. Technol. 1(2), 171–186 (2004)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Thomas, TM., Dietrich, C., Pusz, O., Lohmann, D. (2022). ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation. In: Trapp, M., Saglietti, F., Spisländer, M., Bitsch, F. (eds) Computer Safety, Reliability, and Security. SAFECOMP 2022. Lecture Notes in Computer Science, vol 13414. Springer, Cham. https://doi.org/10.1007/978-3-031-14835-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-14835-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14834-7
Online ISBN: 978-3-031-14835-4
eBook Packages: Computer ScienceComputer Science (R0)