research-article

SDCTune: a model for predicting the SDC proneness of an application for configurable protection

Authors:
Qining Lu

UBC

UBC
View Profile

,
Karthik Pattabiraman

UBC

UBC
View Profile

,
Meeta S. Gupta

Reliability-and Power-Aware Microarchitectures, IBM

Reliability-and Power-Aware Microarchitectures, IBM
View Profile

,
Jude A. Rivers

Reliability-and Power-Aware Microarchitectures, IBM

Reliability-and Power-Aware Microarchitectures, IBM
View Profile

CASES '14: Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded SystemsOctober 2014Article No.: 23Pages 1–10https://doi.org/10.1145/2656106.2656127

Published:12 October 2014Publication History

CASES '14: Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems

Pages 1–10

ABSTRACT

Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded systems. However, current protection techniques are brittle, and do not allow programmers to trade off performance for SDC coverage. Further, many of them require tens of thousands of fault injection experiments, which are highly time-intensive. In this paper, we propose an empirical model to predict the SDC proneness of a program's data called SDCTune. SDCTune is based on static and dynamic features of the program alone, and does not require fault injections to be performed. We then develop an algorithm using SDCTune to selectively protect the most SDC-prone data in the program subject to a given performance overhead bound. Our results show that our technique is highly accurate at predicting the relative SDC rate of an application, and outperforms full duplication by a factor of 0.83 to 1.87x in efficiency of detection (i.e., ratio of SDC coverage provided to performance overhead).

References

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, et al. The NAS parallel benchmarks. HPCA, 1991.Google ScholarDigital Library
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, 2008. Google ScholarDigital Library
S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. MICRO, 2005. Google ScholarDigital Library
J. Cong and K. Gururaj. Assuring application-level correctness against soft errors. In ICCAD, 2011. Google ScholarDigital Library
C. Constantinescu. Intermittent faults and effects on reliability of integrated circuits. In RAMS, 2008. Google ScholarDigital Library
M. de Kruijf, S. Nomura, and K. Sankaralingam. Relax: An architectural framework for software recovery of hardware faults. In ISCA. 2010. Google ScholarDigital Library
M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. Software Engineering, IEEE Transactions on, 2001. Google ScholarDigital Library
S. Feng, S. Gupta, A. Ansari, and S. Mahlke. Shoestring: probabilistic soft error reliability on the cheap. In ASPLOS, 2010. Google ScholarDigital Library
S. K. S. Hari, S. V. Adve, and H. Naeimi. Low-cost program-level detectors for reducing silent data corruptions. In DSN, 2012. Google ScholarDigital Library
J. L. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 2000. Google ScholarDigital Library
M. Hiller, A. Jhumka, and N. Suri. On the placement of software mechanisms for detection of data errors. In DSN, 2002. Google ScholarDigital Library
D. S. Khudia, G. Wright, and S. Mahlke. Efficient soft error protection for commodity embedded microprocessors using profile information. In LCTES, 2012. Google ScholarDigital Library
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO., 2004. Google ScholarDigital Library
K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian. Partially protected caches to reduce failures due to soft errors in multimedia applications. IEEE Transactions on VLSI, 2009. Google ScholarDigital Library
S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn. Flikker: Saving DRAM refresh-power through critical data partitioning. In ASPLOS, 2011. Google ScholarDigital Library
T. Mason et al. LAMPVIEW: A loop-aware toolset for facilitating parallelization. Master's thesis, Dept. of Electrical Engineeringi, Princeton University, 2009.Google Scholar
K. Pattabiraman, Z. Kalbarczyk, and R. K. Iyer. Application-based metrics for strategic placement of detectors. In PRDC., 2005. Google ScholarDigital Library
K. Pattabiraman, G. P. Saggese, D. Chen, Z. Kalbarczyk, and R. K. Iyer. Dynamic derivation of application-specific error detectors and their implementation in hardware. In EDCC., 2006. Google ScholarDigital Library
G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. August. SWIFT: Software implemented fault tolerance. In CGO, 2005. Google ScholarDigital Library
S. K. Sahoo, M.-L. Li, P. Ramachandran, S. V. Adve, V. S. Adve, and Y. Zhou. Using likely program invariants to detect hardware errors. In DSN, 2008.Google ScholarCross Ref
M. Shafique, S. Rehman, P. V. Aceituno, and J. Henkel. Exploiting program-level masking and error propagation for constrained reliability optimization. In DAC, 2013. Google ScholarDigital Library
J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, and W.-m. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing, 2012.Google Scholar
A. Thomas and K. Pattabiraman. Error detector placement for soft computation. In DSN, 2013. Google ScholarDigital Library
L. Wang, Z. Kalbarczyk, and R. Iyer. Formalizing system behavior for evaluating a system hang detector. In Reliable Distributed Systems, 2008. SRDS '08. IEEE Symposium on, pages 269--278, Oct 2008. Google ScholarDigital Library
J. Wei, A. Thomas, G. Li, and K. Pattabiraman. Quantifying the accuracy of high-level fault injection techniques for hardware faults. In DSN, 2014. Google ScholarDigital Library
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, 1995. Google ScholarDigital Library
K. C. Yeager. The MIPS R10000 superscalar microprocessor. MICRO, 1996. Google ScholarDigital Library

Index Terms

SDCTune: a model for predicting the SDC proneness of an application for configurable protection
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
2. General and reference
  1. Cross-computing tools and techniques
    1. Reliability

Recommendations

Configurable Detection of SDC-causing Errors in Programs
Special Issue on Embedded Computing for IoT, Special Issue on Big Data and Regular Papers

Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded systems. However, current protection techniques are brittle and do not allow programmers to trade off performance for SDC coverage. Further, many require ...
Read More
Sampling + DMR: practical and low-overhead permanent fault detection
ISCA '11

With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting and isolating faulty cores, but the required fault detection coverage becomes ...
Read More
Reliability Analysis of N-Modular Redundancy Systems with Intermittent and Permanent Faults

It is well known that static redundancy techniques are very efficient against intermittent (transient) faults which constitute a large portion of logic faults in digital systems. However, very little theoretical work has been done in evaluating the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CASES '14: Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems
October 2014
241 pages
ISBN:9781450330503
DOI:10.1145/2656106
General Chairs:
Karam S. Chatha
Qualcomm Research
,
Rolf Ernst
TU Braunschweig, Germany
,
Program Chairs:
Anand Raghunathan
Purdue University
,
Ravishankar Iyer
Intel
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compiler
modeling
reliability
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 39
  Total Citations
  View Citations
- 137
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SDCTune: a model for predicting the SDC proneness of an application for configurable protection

CASES '14: Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Configurable Detection of SDC-causing Errors in Programs

Sampling + DMR: practical and low-overhead permanent fault detection

Reliability Analysis of N-Modular Redundancy Systems with Intermittent and Permanent Faults

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SDCTune: a model for predicting the SDC proneness of an application for configurable protection

CASES '14: Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Configurable Detection of SDC-causing Errors in Programs

Sampling + DMR: practical and low-overhead permanent fault detection

Reliability Analysis of N-Modular Redundancy Systems with Intermittent and Permanent Faults

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media