skip to main content
research-article

REPAIR: Control Flow Protection based on Register Pairing Updates for SW-Implemented HW Fault Tolerance

Published:17 September 2021Publication History
Skip Abstract Section

Abstract

Safety-critical embedded systems may either use specialized hardware or rely on Software-Implemented Hardware Fault Tolerance (SIHFT) to meet soft error resilience requirements. SIHFT has the advantage that it can be used with low-cost, off-the-shelf components such as standard Micro-Controller Units. For this, SIHFT methods apply redundancy in software computation and special checker codes to detect transient errors, so called soft errors, that either corrupt the data flow or the control flow of the software and may lead to Silent Data Corruption (SDC). So far, this is done by applying separate SIHFT methods for the data and control flow protection, which leads to large overheads in computation time.

This work in contrast presents REPAIR, a method that exploits the checks of the SIHFT data flow protection to also detect control flow errors as well, thereby, yielding higher SDC resilience with less computational overhead. For this, the data flow protection methods entail duplicating the computation with subsequent checks placed strategically throughout the program. These checks assure that the two redundant computation paths, which work on two different parts of the register file, yield the same result. By updating the pairing between the registers used in the primary computation path and the registers in the duplicated computation path using the REPAIR method, these checks also fail with high coverage when a control flow error, which leads to an illegal jumps, occurs. Extensive RTL fault injection simulations are carried out to accurately quantify soft error resilience while evaluating Mibench programs along with an embedded case-study running on an OpenRISC processor. Our method performs slightly better on average in terms of soft error resilience compared to the best state-of-the-art method but requiring significantly lower overheads. These results show that REPAIR is a valuable addition to the set of known SIHFT methods.

References

  1. Zeyad Alkhalifa, Suku Nair, Narayanan Krishnamurthy, and Jacob A. Abraham. 1999. Design and evaluation of system-level checks for on-line control flow error detection. IEEE Transactions on Parallel and Distributed Systems 10, 6 (1999), 627–641. DOI:https://doi.org/10.1109/71.774911Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adam Bennett. 2008. Recommended Practice for DMX512: A Guide for Users and Installers : Incorporating USITT DMX512-A and Remote Device Management, RDM. PLASA. https://books.google.de/books?id=NQopQwAACAAJ.Google ScholarGoogle Scholar
  3. Matthew Bohman, Benjamin James, Michael J. Wirthlin, Heather Quinn, and Jeffrey Goeders. 2019. Microcontroller compiler-assisted software fault tolerance. IEEE Transactions on Nuclear Science 66, 1 (2019), 223–232. DOI:https://doi.org/10.1109/TNS.2018.2886094Google ScholarGoogle ScholarCross RefCross Ref
  4. Zizhong Chen. 2013. Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods. SIGPLAN Not. 48, 8 (Feb. 2013), 167–176. DOI:https://doi.org/10.1145/2517327.2442533Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Eric Cheng, Shahrzad Mirkhani, Lukasz G. Szafaryn, Chen Yong Cher, Hyungmin Cho, Kevin Skadron, Mircea R. Stan, Klas Lilja, Jacob A. Abraham, Pradip Bose, and Subhasish Mitra. 2016. CLEAR: Crosslayer exploration for architecting resilience combining hardware and software techniques to tolerate soft errors in processor cores. Proceedings - Design Automation Conference 05-09-June (2016). DOI:https://doi.org/10.1145/2897937.2897996Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ph. Cheynet, Bogdan Nicolescu, Raoul Velazco, Maurizio Rebaudengo, Matteo Sonza Reorda, and Massimo Violante. 2001. System safety through automatic high-level code transformations: an experimental evaluation. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2001, Munich, Germany, March 12-16, 2001, Wolfgang Nebel and Ahmed Jerraya (Eds.). IEEE Computer Society, 297–301. DOI:https://doi.org/10.1109/DATE.2001.915040Google ScholarGoogle Scholar
  7. Eduardo Chielle, Gennaro S. Rodrigues, Fernanda L. Kastensmidt, Sergio Cuenca-Asensi, Lucas A. Tambara, Paolo Rech, and Heather Quinn. 2015. S-SETA: Selective software-only error-detection technique using assertions. IEEE Transactions on Nuclear Science 62, 6 (2015), 3088–3095. DOI:https://doi.org/10.1109/TNS.2015.2484842Google ScholarGoogle ScholarCross RefCross Ref
  8. Hyungmin Cho, Shahrzad Mirkhani, Chen Yong Cher, Jacob A. Abraham, and Subhasish Mitra. 2013. Quantitative evaluation of soft error injection techniques for robust system dsesign. Proceedings - Design Automation Conference (2013). DOI:https://doi.org/10.1145/2463209.2488859Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Moslem Didehban and Aviral Shrivastava. 2016. NZDC: A compiler technique for near zero silent data corruption. In Proceedings of the 53rd Annual Design Automation Conference (DAC’16). Association for Computing Machinery, New York, NY, USA, Article 48, 6 pages. DOI:https://doi.org/10.1145/2897937.2898054Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Moslem Didehban, Aviral Shrivastava, and Sai Ram Dheeraj Lokam. 2017. NEMESIS: A software approach for computing in presence of soft errors. In 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 297–304. DOI:https://doi.org/10.1109/ICCAD.2017.8203792Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Peng Du, Aurelien Bouteiller, George Bosilca, Thomas Herault, and Jack Dongarra. 2012. Algorithm-based fault tolerance for dense matrix factorizations. SIGPLAN Not. 47, 8 (Feb. 2012), 225–234. DOI:https://doi.org/10.1145/2370036.2145845Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke. 2010. Shoestring: Probabilistic soft error reliability on the cheap. International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (2010), 385–396. DOI:https://doi.org/10.1145/1736020.1736063Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Olga Goloubeva, Maurizio Rebaudengo, Matteo Sonza Reorda, and Massimo Violante. 2003. Soft-error detection using control flow assertions. In Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems. 581–588. DOI:https://doi.org/10.1109/DFTVS.2003.1250158Google ScholarGoogle ScholarCross RefCross Ref
  14. Olga Goloubeva, Maurizio Rebaudengo, Matteo Sonza Reorda, and Massimo Violante. 2006. Software-Implemented Hardware Fault Tolerance. Springer US. 228 pages. DOI:https://doi.org/10.1007/0-387-32937-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Matthew R. Guthaus, Jeffrey S. Ringenberg, Daniel J. Ernst, Todd M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538). 3–14. DOI:https://doi.org/10.1109/WWC.2001.990739Google ScholarGoogle ScholarCross RefCross Ref
  16. Siva Kumar Sastry Hari, Sarita V. Adve, and Helia Naeimi. 2012. Low-cost program-level detectors for reducing silent data corruptions. Proceedings of the International Conference on Dependable Systems and Networks (2012). DOI:https://doi.org/10.1109/DSN.2012.6263960Google ScholarGoogle Scholar
  17. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. International Symposium on Code Generation and Optimization, CGOc (2004), 75–86. DOI:https://doi.org/10.1109/CGO.2004.1281665Google ScholarGoogle ScholarCross RefCross Ref
  18. Régis Leveugle, A. Calvez, Paolo Maistri, and Pierre Vanhauwaert. 2009. Statistical fault injection: Quantified error and confidence. Proceedings -Design, Automation and Test in Europe, DATE (2009), 502–506. DOI:https://doi.org/10.1109/date.2009.5090716Google ScholarGoogle ScholarCross RefCross Ref
  19. Aiguo Li and Bingrong Hong. 2007. Software implemented transient fault detection in space computer. Aerospace Science and Technology 11, 2–3 (2007), 245–252. DOI:https://doi.org/10.1016/j.ast.2006.06.006Google ScholarGoogle ScholarCross RefCross Ref
  20. S. S. Mukherjee, J. Emer, and S. K. Reinhardt. 2005. The soft error problem: An architectural perspective. In 11th International Symposium on High-Performance Computer Architecture. 243–247. DOI:https://doi.org/10.1109/HPCA.2005.37Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Shubhendu S. Mukherjee, Michael Kontz, and Steven K. Reinhardt. 2002. Detailed design and evaluation of redundant multithreading alternatives. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). IEEE Computer Society, USA, 99–110.Google ScholarGoogle Scholar
  22. Bogdan Nicolescu, Yvon Savaria, and Raoul Velazco. 2003. SIED: Software implemented error detection. In Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems. 589–596. DOI:https://doi.org/10.1109/DFTVS.2003.1250159Google ScholarGoogle ScholarCross RefCross Ref
  23. Nahmsuk Oh, Philip P. Shirvani, and Edward J. McCluskey. 2002. Control-flow checking by software signatures. IEEE Transactions on Reliability 51, 1 (2002), 111–122. DOI:https://doi.org/10.1109/24.994926Google ScholarGoogle ScholarCross RefCross Ref
  24. Nahmsuk Oh, Philip P. Shirvani, and Edward J. McCluskey. 2002. Error detection by duplicated instructions in super-scalar processors. IEEE Transactions on Reliability 51, 1 (2002), 63–75. DOI:https://doi.org/10.1109/24.994913Google ScholarGoogle ScholarCross RefCross Ref
  25. Preeti Ranjan Panda. 2001. SystemC: A modeling platform supporting multiple design abstractions. In Proceedings of the 14th International Symposium on Systems Synthesis (ISSS’01). Association for Computing Machinery, New York, NY, USA, 75–80. DOI:https://doi.org/10.1145/500001.500018Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Maurizio Rebaudengo, Matteo Sonza Reorda, Marco Torchiano, and Massimo Violante. 1999. Soft-error detection through software fault-tolerance techniques. In Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT’99). 210–218. DOI:https://doi.org/10.1109/DFTVS.1999.802887Google ScholarGoogle ScholarCross RefCross Ref
  27. Semeen Rehman, Muhammad Shafique, Florian Kriebel, and Jörg Henkel. 2011. Reliable software for unreliable hardware: Embedded code generation aiming at reliability. Embedded Systems Week 2011, ESWEEK 2011 - Proceedings of the 9th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS’11 (2011), 237–246. DOI:https://doi.org/10.1145/2039370.2039408Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, and David I. August. SWIFT: Software implemented fault tolerance. In International Symposium on Code Generation and Optimization. IEEE, 243–254. DOI:https://doi.org/10.1109/CGO.2005.34Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Abhishek Rhisheekesan, Reiley Jeyapaul, and Aviral Shrivastava. 2019. Control flow checking or not? (for Soft Errors). ACM Transactions on Embedded Computing Systems 18, 1 (2019). DOI:https://doi.org/10.1145/3301311Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Eric Rotenberg. 1999. AR-SMT: A microarchitectural approach to fault tolerance in microprocessors. In Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352). 84–91. DOI:https://doi.org/10.1109/FTCS.1999.781037Google ScholarGoogle ScholarCross RefCross Ref
  31. Horst Schirmeier, Christoph Borchert, and Olaf Spinczyk. 2015. Avoiding pitfalls in fault-injection based comparison of program susceptibility to soft errors. Proceedings of the International Conference on Dependable Systems and Networks 2015-September (2015), 319–330. DOI:https://doi.org/10.1109/DSN.2015.44Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Charles Slayman. 2011. Soft error trends and mitigation techniques in memory devices. In 2011 Proceedings - Annual Reliability and Maintainability Symposium. 1–5. DOI:https://doi.org/10.1109/RAMS.2011.5754515Google ScholarGoogle ScholarCross RefCross Ref
  33. Anna Thomas and Karthik Pattabiraman. 2013. Error detector placement for soft computation. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 1–12. DOI:https://doi.org/10.1109/DSN.2013.6575353Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jens Vankeirsbilck, Niels Penneman, Hans Hallez, and Jeroen Boydens. 2017. Random Additive signature monitoring for control flow error detection. IEEE Transactions on Reliability 66, 4 (dec 2017), 1178–1192. DOI:https://doi.org/10.1109/TR.2017.2754548Google ScholarGoogle ScholarCross RefCross Ref
  35. Jens Vankeirsbilck, Niels Penneman, Hans Hallez, and Jeroen Boydens. 2018. Random Additive control flow error detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Barbara Gallina, Amund Skavhaug, and Friedemann Bitsch (Eds.), Vol. 11093 LNCS. Springer International Publishing, Cham, 220–234. DOI:https://doi.org/10.1007/978-3-319-99130-6_15Google ScholarGoogle Scholar
  36. Ramtilak Vemu and Jacob A. Abraham. 2011. CEDA: Control-flow error detection using assertions. IEEE Trans. Comput. 60, 9 (2011), 1233–1245. DOI:https://doi.org/10.1109/TC.2011.101Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nicholas J. Wang, Justin Quek, Todd M. Rafacz, and Sanjay J. Patel. 2004. Characterizing the effects of transient faults on a high-performance processor pipeline. In International Conference on Dependable Systems and Networks, 2004. IEEE, 61–70. DOI:https://doi.org/10.1109/DSN.2004.1311877Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. REPAIR: Control Flow Protection based on Register Pairing Updates for SW-Implemented HW Fault Tolerance

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 20, Issue 5s
          Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
          October 2021
          1367 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/3481713
          • Editor:
          • Tulika Mitra
          Issue’s Table of Contents

          Copyright © 2021 Association for Computing Machinery.

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 September 2021
          • Accepted: 1 July 2021
          • Revised: 1 June 2021
          • Received: 1 April 2021
          Published in tecs Volume 20, Issue 5s

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format