skip to main content
10.1145/2656106.2656127acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

SDCTune: a model for predicting the SDC proneness of an application for configurable protection

Published:12 October 2014Publication History

ABSTRACT

Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded systems. However, current protection techniques are brittle, and do not allow programmers to trade off performance for SDC coverage. Further, many of them require tens of thousands of fault injection experiments, which are highly time-intensive. In this paper, we propose an empirical model to predict the SDC proneness of a program's data called SDCTune. SDCTune is based on static and dynamic features of the program alone, and does not require fault injections to be performed. We then develop an algorithm using SDCTune to selectively protect the most SDC-prone data in the program subject to a given performance overhead bound. Our results show that our technique is highly accurate at predicting the relative SDC rate of an application, and outperforms full duplication by a factor of 0.83 to 1.87x in efficiency of detection (i.e., ratio of SDC coverage provided to performance overhead).

References

  1. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, et al. The NAS parallel benchmarks. HPCA, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. MICRO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cong and K. Gururaj. Assuring application-level correctness against soft errors. In ICCAD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Constantinescu. Intermittent faults and effects on reliability of integrated circuits. In RAMS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. de Kruijf, S. Nomura, and K. Sankaralingam. Relax: An architectural framework for software recovery of hardware faults. In ISCA. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. Software Engineering, IEEE Transactions on, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Feng, S. Gupta, A. Ansari, and S. Mahlke. Shoestring: probabilistic soft error reliability on the cheap. In ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. K. S. Hari, S. V. Adve, and H. Naeimi. Low-cost program-level detectors for reducing silent data corruptions. In DSN, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. L. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Hiller, A. Jhumka, and N. Suri. On the placement of software mechanisms for detection of data errors. In DSN, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. S. Khudia, G. Wright, and S. Mahlke. Efficient soft error protection for commodity embedded microprocessors using profile information. In LCTES, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO., 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian. Partially protected caches to reduce failures due to soft errors in multimedia applications. IEEE Transactions on VLSI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn. Flikker: Saving DRAM refresh-power through critical data partitioning. In ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Mason et al. LAMPVIEW: A loop-aware toolset for facilitating parallelization. Master's thesis, Dept. of Electrical Engineeringi, Princeton University, 2009.Google ScholarGoogle Scholar
  17. K. Pattabiraman, Z. Kalbarczyk, and R. K. Iyer. Application-based metrics for strategic placement of detectors. In PRDC., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Pattabiraman, G. P. Saggese, D. Chen, Z. Kalbarczyk, and R. K. Iyer. Dynamic derivation of application-specific error detectors and their implementation in hardware. In EDCC., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. August. SWIFT: Software implemented fault tolerance. In CGO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. K. Sahoo, M.-L. Li, P. Ramachandran, S. V. Adve, V. S. Adve, and Y. Zhou. Using likely program invariants to detect hardware errors. In DSN, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Shafique, S. Rehman, P. V. Aceituno, and J. Henkel. Exploiting program-level masking and error propagation for constrained reliability optimization. In DAC, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, and W.-m. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing, 2012.Google ScholarGoogle Scholar
  23. A. Thomas and K. Pattabiraman. Error detector placement for soft computation. In DSN, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Wang, Z. Kalbarczyk, and R. Iyer. Formalizing system behavior for evaluating a system hang detector. In Reliable Distributed Systems, 2008. SRDS '08. IEEE Symposium on, pages 269--278, Oct 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Wei, A. Thomas, G. Li, and K. Pattabiraman. Quantifying the accuracy of high-level fault injection techniques for hardware faults. In DSN, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. C. Yeager. The MIPS R10000 superscalar microprocessor. MICRO, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SDCTune: a model for predicting the SDC proneness of an application for configurable protection

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CASES '14: Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems
            October 2014
            241 pages
            ISBN:9781450330503
            DOI:10.1145/2656106

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 October 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate52of230submissions,23%

            Upcoming Conference

            ESWEEK '24
            Twentieth Embedded Systems Week
            September 29 - October 4, 2024
            Raleigh , NC , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader