Abstract
The traditional performance cost benefits we have enjoyed for decades from technology scaling are challenged by several critical constraints including reliability. Increases in static and dynamic variations are leading to higher probability of parametric and wear-out failures and are elevating reliability into a prime design constraint. In particular, SRAM cells used to build caches that dominate the processor area are usually minimum sized and more prone to failure. It is therefore of paramount importance to develop effective methodologies that facilitate the exploration of reliability techniques for caches.
To this end, we present an analytical model that can determine for a given cache configuration, address trace, and random probability of permanent cell failure the exact expected miss rate and its standard deviation when blocks with faulty bits are disabled. What distinguishes our model is that it is fully analytical, it avoids the use of fault maps, and yet, it is both exact and simpler than previous approaches. The analytical model is used to produce the miss-rate trends (expected miss-rate) for future technology nodes for both uncorrelated and clustered faults. Some of the key findings based on the proposed model are (i) block disabling has a negligible impact on the expected miss-rate unless probability of failure is equal or greater than 2.6e-4, (ii) the fault map methodology can accurately calculate the expected miss-rate as long as 1,000 to 10,000 fault maps are used, and (iii) the expected miss-rate for execution of parallel applications increases with the number of threads and is more pronounced for a given probability of failure as compared to sequential execution.
- Agarwal, A., Hennessy, J., and Horowitz, M. 1989. An analytical cache model. ACM Trans. Comput. Syst. 7, 2, 184--215. Google ScholarDigital Library
- Agarwal, K. and Nassif, S. 2006. Statistical analysis of SRAM cell stability. In Proceedings of the 43rd Annual Design Automation Conference. ACM, New York, 57--62. Google ScholarDigital Library
- Ansari, A., Gupta, S., Feng, S., and Mahlke, S. 2009. ZerehCache: Armoring cache architectures in high defect density technologies. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 100--110. Google ScholarDigital Library
- Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, 72--81. Google ScholarDigital Library
- Borkar, S. 1999. Design challenges of technology scaling. IEEE Micro 19, 4, 23--29. Google ScholarDigital Library
- Borkar, S., Karnik, T., Narendra, S., Tschanz, J., Keshavarzi, A., and De, V. 2003. Parameter variations and impact on circuits and microarchitecture. In Proceedings of the 40th Annual Design Automation Conference. ACM, New York, 338--342. Google ScholarDigital Library
- Bowman, K. A., Alameldeen, A. R., Srinivasan, S. T., and Wilkerson, C. B. 2007. Impact of die-to-die and within-die parameter variations on the throughput distribution of multi-core processors. In Proceedings of the 2007 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'07). 50--55. DOI: http://dx.doi.org/10.1145/1283780.1283792 Google ScholarDigital Library
- Bowman, K. A., Duvall, S. G., and Meindl, J. D. 2002. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE J. Solid-State Circuits 37, 2, 183--190.Google ScholarCross Ref
- Bowman, K., Tschanz, J., Wilkerson, C., Lu, S.-L., Karnik, T., De, V., and Borkar, S. 2009. Circuit techniques for dynamic variation tolerance. In Proceedings of the 46th Annual Design Automation Conference. ACM, New York, 4--7. Google ScholarDigital Library
- Cheng, L., Gupta, P., Spanos, C. J., Qian, K., and He, L. 2011. Physically justifiable die-level modeling of spatial variation in view of systematic across wafer variability. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 30, 3, 388--401. Google ScholarDigital Library
- Frank, D. J. 2002. Power-constrained CMOS scaling limits. IBM J. Res. Dev. 46, 2/3, 235--244. Google ScholarDigital Library
- Henning, J. L. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4, 1--17. Google ScholarDigital Library
- Hill, M. D. and Smith, A. J. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. 38, 12, 1612--1630. Google ScholarDigital Library
- Ishihara, T. and Fallah, F. 2005. A cache-defect-aware code placement algorithm for improving the performance of processors. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 995--1001. Google ScholarDigital Library
- Kim, T.-H., Liu, J., Keane, J., and Kim, C. H. 2008. A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing. IEEE J. Solid-State Circuits 43, 2, 518--529.Google ScholarCross Ref
- Koh, C.-K., Wong, W.-F., Chen, Y., and Li, H. 2009. Tolerating process variations in large, set-associative caches: The buddy cache. ACM Trans. Archit. Code Optim. 6, 2, 8. Google ScholarDigital Library
- Koren, I., Koren, Z., and Stepper, C. H. 1993. A unified negative-binomial distribution for yield analysis of defect-tolerant circuits. IEEE Trans. Comput. 42, 6, 724--734. Google ScholarDigital Library
- Ladas, N., Sazeides, Y., and Desmet, V. 2010. Performance-effective operation below Vcc-min. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems Software. 223--234.Google Scholar
- Le, H. Q., Starke, W. J., Fields, J. S., O'Connell, F. P., Nguyen, D. Q., Ronchetti, B. J., Sauer, W. M., Schwarz, E. M., and Vaden, M. T. 2007. IBM POWER6 microarchitecture. IBM J. Res. Dev. 51, 6, 639--662. Google ScholarDigital Library
- Lee, H., Cho, S., and Childers, B. R. 2007a. Exploring the interplay of yield, area, and performance in processor caches. In Proceedings of the 25th International Conference on Computer Design. 216--223.Google Scholar
- Lee, H., Cho, S., and Childers, B. R. 2007b. Performance of graceful degradation for cache faults. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI. 409--415. Google ScholarDigital Library
- Lee, H., Cho, S., and Childers, B. R. 2011. DEFCAM: A design and evaluation framework for defect-tolerant cache memories. ACM Trans. Archit. Code Optim. 8, 3, 17. Google ScholarDigital Library
- Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B., and Werner, B. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58. Google ScholarDigital Library
- Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 4. Google ScholarDigital Library
- McNairy, C. and Mayfield, J. 2005. Montecito error protection and mitigation. In Proceedings of the 1st Workshop on High Performance Computing Reliability Issues, in Conjunction with HPCA'05.Google Scholar
- Nassif, S. R., Mehta, N., and Cao, Y. 2010. A resilience roadmap. In Proceedings of the Design, Automation, and Test Conference in Europe. 1011--1016. Google ScholarDigital Library
- Patterson, D. A., Garrison, P., Hill, M., Lioupis, D., Nyberg, C., Sippel, T., and Van Dyke, K. 1983. Architecture of a VLSI instruction cache for a RISC. In Proceedings of the 10th Annual International Symposium on Computer Architecture. ACM, New York, 108--116. Google ScholarDigital Library
- Pour, A. F. and Hill, M. D. 1993. Performance implications of tolerating cache faults. IEEE Trans. Comput. 42, 3, 257--267. Google ScholarDigital Library
- Roberts, D., Kim, N. S., and Mudge, T. 2007. On-chip cache device scaling limits and effective fault repair techniques in future nanoscale technology. In Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools. 570--578. Google ScholarDigital Library
- Sánchez, D., Sazeides, Y., Aragón, J. L., and Garcia, J. M. 2011. An analytical model for the calculation of the expected miss ratio in faulty caches. In IOLTS. 252--257. Google ScholarDigital Library
- Shirvani, P. P. and McCluskey, E. J. 1999. PADded cache: A new fault-tolerance technique for cache memories. In Proceedings of the 17th IEEE VLSI Test Symposium. IEEE Computer Society, 440--445. Google ScholarDigital Library
- Sohi, G. S. 1989. Cache memory organization to enhance the yield of high performance VLSI processors. IEEE Trans. Comput. 38, 4, 484--492. Google ScholarDigital Library
- Song, F., Moore, S., and Dongarra, J. 2007. L2 cache modeling for scientific applications on chip multi-processors. In Proceedings of the 2007 International Conference on Parallel Processing (ICPP'07). 51. Google ScholarDigital Library
- Stapper, C. H., Armstrong, F. M., and Saji, K. 1983. Integrated circuit yield statistics. Proc. IEEE 71, 4, 453--470.Google ScholarCross Ref
- Taur, Y. 2002. CMOS design near to the limit of scaling. IBM J. Res. Dev. 46, 2/3, 213--222. Google ScholarDigital Library
- Unsal, O. S., Tschanz, J. W., Bowman, K., De, V., Vera, X., Gonzalez, A., and Ergin, O. 2006. Impact of parameter variations on circuits and microarchitecture. IEEE Micro 26, 6, 30--39. DOI: http://dx.doi.org/10.1109/MM.2006.122. Google ScholarDigital Library
- Verma, N. and Chandrakasan, A. P. 2008. A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy. IEEE J. Solid-State Circuits 43, 1, 141--149.Google ScholarCross Ref
- Wilkerson, C., Gao, H., Alameldeen, A. R., Chishti, Z., Khellah, M., and Lu, S.-L. 2008. Trading off cache capacity for reliability to enable low voltage operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture. 203--214. Google ScholarDigital Library
- Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. ACM, New York, 24--36 Google ScholarDigital Library
- Yamaoka, M., Osada, K., Tsuchiya, R., Horiuchi, M., Kimura, S., and Kawahara, T. 2004. Low power SRAM menu for SOC application using Yin-Yang-feedback memory cell technology. In Proceedings of the Symposium on VLSI Circuits. 288--291.Google Scholar
- Yao, S. B. 1977. Approximating block accesses in database organizations. Commun. ACM 20, 4, 260--261. Google ScholarDigital Library
- Zhang, K., Bhattacharya, U., Chen, Z., Hamzaoglu, F., Murray, D., Vallepalli, N., Wang, Y., Zheng, B., and Bohr, M. 2004. SRAM design on 65nm CMOS technology with integrated leakage reduction scheme. In Proceedings of the Symposium on VLSI Circuits. 294--295.Google Scholar
Index Terms
- Modeling the impact of permanent faults in caches
Recommendations
Reducing traffic generated by conflict misses in caches
CF '04: Proceedings of the 1st conference on Computing frontiersOff-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a ...
Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties
As transistors keep shrinking and on-chip caches keep growing, static power dissipation resulting from leakage of caches takes an increasing fraction of total power in processors. Several techniques have already been proposed to reduce leakage power by ...
Reliability Analysis of N-Modular Redundancy Systems with Intermittent and Permanent Faults
It is well known that static redundancy techniques are very efficient against intermittent (transient) faults which constitute a large portion of logic faults in digital systems. However, very little theoretical work has been done in evaluating the ...
Comments