[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1993744.1993755acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Soft error benchmarking of L2 caches with PARMA

Published: 07 June 2011 Publication History

Abstract

The amount of charge stored in an SRAM cell shrinks rapidly with each technology generation thus increasingly exposing caches to soft errors. Benchmarking the FIT rate of caches due to soft errors is critical to evaluate the relative merits of a plethora of protection schemes that are being proposed to protect against soft errors. The benchmarking of cache reliability introduces a unique challenge as compared to internal processor storage structures, such as the load/store queue. In the case of internal processor structures the time a data bit resides in the structure is so short that it is generally safe to assume that no more than one soft error strike can occur. Thus the reliability of such structures is overwhelmingly dominated by single bit errors. By contrast, a memory block may reside for millions of cycles in a last level cache. In this case it is important to consider the impact of the spatial and temporal distribution of multiple errors within the lifetime of a cache block in the presence of error protection.
This paper introduces a unified reliability benchmarking framework called PARMA (Precise Analytical Reliability Model for Architecture). PARMA is a rigorous analytical framework that accurately accounts for the distribution of multiple errors to measure the failure rate under any protection scheme. In a single simulation run PARMA provides a precise FIT rate (expected number of failures in one billion hours) measurement for storage structures where the effect of multiple errors cannot be neglected. We have implemented the PARMA framework on top of a cycle-accurate out-of-order processor simulator (sim-outorder) to benchmark L2 cache failure rates for a set of CPU 2000 benchmarks. The effectiveness of three protection schemes are compared in terms of L2 cache FIT rate: parity, word-level Single Error Correcting Double Error Detecting (SECDED) code and block-level SECDED.
Exploiting the accuracy of PARMA, we demonstrate that current techniques to evaluate cache FIT rates in the presence of SECDED, such as accelerated fault injection simulations and first-principle derivations based on Architectural Vulnerability Factor (AVF), can overestimate FIT rates by vast amounts. Based on the insights gained during this research we also introduce a new approximate analytical model that can quickly and more accurately estimate cache FIT rate in the presence of SECDED.

Supplementary Material

JPG File (metrics_3_1.jpg)
MP4 File (metrics_3_1.mp4)

References

[1]
H. Asadi, V. Sridharan, M.B. Tahoori, and D. Kaeli. Vulnerability analysis of L2 cache elements to single event upsets. In Proceedings of the Design, Automation and Test in Europe, 1276--1281, 2006.
[2]
S. Baeg, S. Wen, R. Wong, SRAM Interleaving Distance Selection With a Soft Error Failure Model, Nuclear Science, IEEE Transactions on, vol.56, no.4, pp.2111--2118, Aug. 2009
[3]
M.A. Bajura, Y. Boulghassoul, R. Naseer, S. DasGupta, A.F. Witulski, J. Sondeen, S.D. Stansberry, J. Draper, L.W. Massengill, J.N. Damoulakis. Models and Algorithmic Limits for an ECC-Based Approach to Hardening Sub-100-nm SRAMs. In IEEE Transactions on Nuclear Science, 54(4), 935--945, 2007.
[4]
A. Biswas, P. Racunas, R. Cheveresan, J. Emer, S. Mukherjee, R Rangan, Computing Architectural Vulnerability Factors for Address-Based Structures, In Proceedings of the 32nd International Symposium on Computer Architecture, 532--543, 2005
[5]
Arijit Biswas, Charles Recchia, Shubhendu S. Mukherjee, Vinod Ambrose, Leo Chan, Aamer Jaleel, Mike Plaster, and Norbert Seifert, Explaining Cache SER Anomaly Using Relative DUE AVF Measurement, In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2010
[6]
D. Burger and T. M. Austin. The SimpleScalar Tool Set Version 2.0. Technical Report 1342, CS Department, University of Wisconsin-Madison.
[7]
D. Ernst, N. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. Razor: a low-power pipeline based on circuit-level timing speculation. In Proceedings of the 36th International Symposium on Microarchitecture, pages 7--18, 2003.
[8]
K. Flautner, N.S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: simple techniques for reducing leakage power. In Proceedings of the 29th International Symposium on Computer Architecture, 148--157, 2002.
[9]
E. Ibe, S.S. Chung, S. Wen, H Yamaguchi, Y Yahagi, H Kameyama, S Yamamoto, T Akioka, "Spreading Diversity in Multi-cell Neutron-Induced Upsets with Device Scaling," Custom Integrated Circuits Conference, 2006. CICC '06. IEEE, vol., no., pp.437--444, 10--13 Sept. 2006
[10]
J.W. Kellington, R. McBeth, P. Sanda, and R.N. Kalla. IBM POWER6 Processor Soft Error Tolerance Analysis Using Proton Irradiation. In Proceedings of the 3rd IEEE Workshop on Silicon Errors in Logic System Effects, 2007.
[11]
M. Li, P. Ramachandran, R.U. Karpuzcu, S.K.S Hari, S. Adve. Accurate Microarchitecture-Level Fault Modeling for Studying Hardware Faults. In Proceedings of the International Conference on High Performance Computer Architecture, 105--116, 2009.
[12]
X. Li, S. Adve, P. Bose, and J.A. Rivers. Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions. In Proceedings of the International Conference on Dependable Systems and Networks, 266--275, 2007.
[13]
X. Li, S. Adve, P. Bose, and J.A. Rivers. SoftArch: An Architecture Level Tool for Modeling and Analyzing Soft Errors. In Proceedings of the International Conference on Dependable Systems and Networks, 496--505, 2005.
[14]
Liu, L.; Peir, J.K.;, Cache sampling by sets, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol.1, no.2, pp.98--105, Jun 1993
[15]
M. Manoochehri, M. Annavaram, M. Dubois. CPPC: Correctable Parity Protected Cache, In Proceedings of the 38th International Symposium on Computer Architecture, 2011
[16]
S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T.Austin. A systematic methodology to calculate the architectural vulnerability factors for a high-performance microprocessor. In Proceedings of the 36th International Symposium on Microarchitecture, pages 29--40, 2003.
[17]
S. S. Mukherjee, J. Emer, T. Fossum, and S. K. Reinhardt. Cache Scrubbing in Microprocessors: Myth or Necessity? In Proceedings of the 10th IEEE Pacific Rim Symposium on Dependable Computing, 37--42, 2004.
[18]
R. Naseer, Y. Boulghassoul, J. Draper, S. DasGupta, A. Witulski. Critical Charge Characterization for Soft Error Rate Modeling in 90nm SRAM. In Proceedings of the IEEE Symposium on Circuits and Systems, 1879--1882, 2007.
[19]
E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder. Using SimPoint for Accurate and Efficient Simulation. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 318--319, 2003.
[20]
M. Rebaudengo, M. S. Reorda, and M. Violante. An Accurate Analysis of the Effects of Soft Errors in the Instruction and Date Caches of a Pipelined Microprocessor. In Proceedings of the Design, Automation and Test in Europe, 602--607, 2003.
[21]
A.M.Saleh, J.J.Serrano, and J.H.Patel. Reliability of Scrubbing Recovery Techniques for Memory Systems. In IEEE Transactions on Reliability, 39(1), 114--122, 1990.
[22]
Semiconductor Industries Association, International Technology Roadmap for Semiconductors.
[23]
Sridharan, V., Asadi, H., Tahoori, M. B., and Kaeli, D. 2006. Reducing Data Cache Susceptibility to Soft Errors. IEEE Trans. Dependable Secur. Comput. 3, 4 Oct. 2006, 353--364.
[24]
Tosaka, Y., Satoh, S., Itakura, T., Suzuki, K., Sugii, T., Ehara, H., Woffinden, G.A., Cosmic ray neutron-induced soft errors in sub-half micron CMOS circuits, Electron Device Letters, IEEE, vol.18, no.3, pp.99--101, Mar 1997
[25]
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. Cacti 5.1. HP Report HPL-2008-20.
[26]
C. Weaver, J. Emer, S.S. Mukherjee, and S.K. Reinhardt. Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor. In Proceedings of the International Symposium on Computer Architecture, 264--274, 2004.
[27]
Wilkerson, C., Alameldeen, A. R., Chishti, Z., Wu, W., Somasekhar, D., and Lu, S. 2010. Reducing cache power with low-cost, multi-bit error-correcting codes. In Proceedings of the 37th Annual international Symposium on Computer Architecture. ISCA '10., 83--93.
[28]
Wunderlich, R. E., Wenisch, T. F., Falsafi, B., and Hoe, J. C. "SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling." In Proceedings of the 30th Annual international Symposium on Computer Architecture. ISCA '03., 84--97.
[29]
B. Zandian, M. Annavaram. Cross-layer Resilience Using Wearout Aware Design Flow. Dependable Systems and Networks (DSN), 2011.
[30]
B. Zandian, W. Dweik, S. Kang, T. Punihaole, and M. Annavaram. WearMon: Reliability Monitoring Using Adaptive Critical Path Testing. Dependable Systems and Networks (DSN), pages 151--160, 2010.

Cited By

View all
  • (2021)Remove Minimum (RM): An Error-Tolerant Scheme for Cardinality Estimate by HyperLogLogIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2020.3013746(1-1)Online publication date: 2021
  • (2020)CASH: correlation-aware scheduling to mitigate soft error impact on heterogeneous multicoresConnection Science10.1080/09540091.2020.1758924(1-23)Online publication date: 18-May-2020
  • (2018)Vulnerability-aware Energy Optimization for Reconfigurable Caches in Multitasking SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.2834410(1-1)Online publication date: 2018
  • Show More Cited By

Recommendations

Reviews

Florin Popentiu

The authors state in their introduction that the precise analytical reliability model for architecture (PARMA) "uses a rigorous fault generation model to address temporal multi-bit errors (MBEs), starting with the probability of SEUs [single event upsets] on a single bit and then expounding the probabilities in both temporal and spatial dimensions." They go on to claim, Using the insights gained from the comparison of the PARMA model with prior approximate analytical models, we have introduced a new approximate analytical model based on a refined AVF [architectural vulnerability factor] methodology to estimate the DUE FIT rate of SECDED [single error correcting double error detecting] protected caches. The introduction clearly explains why the topic is important, and the authors cite relevant related papers and show their contributions clearly (p. 86). Topics include goals of performance analysis and measurement, performance metrics, means, modes, measurement tools and techniques, perturbations due to measurement, and the design of experiments and simulation. The paper introduces an original and unified framework for measuring the reliability of static random-access memory (SRAM) arrays protected by any possible error protection scheme, and a new and highly accurate approximate analytical model for measuring the FIT rate of caches protected by word-level SECDED codes. The conclusions follow directly from the body of the paper. They are well structured and introduce no new material. The technical approach is very courageous. However, for better performance on real-world applications, the authors should perhaps try in future works to cut out some limitations and confusions. First of all, the probability distribution of SEUs is assumed to be the same after every cycle. Another weakness likely to bewilder the student is the assumption that there is no correlation between SEUs affecting any two cache bits. The results show that PARMA simulation is slower than the basic sim-outorder simulation by a factor of about 25 times for 100 million SimPoint simulations. Those interested in this area of research should also study three other papers [1,2,3]. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
June 2011
376 pages
ISBN:9781450308144
DOI:10.1145/1993744
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache
  2. reliability
  3. soft error

Qualifiers

  • Research-article

Conference

SIGMETRICS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Remove Minimum (RM): An Error-Tolerant Scheme for Cardinality Estimate by HyperLogLogIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2020.3013746(1-1)Online publication date: 2021
  • (2020)CASH: correlation-aware scheduling to mitigate soft error impact on heterogeneous multicoresConnection Science10.1080/09540091.2020.1758924(1-23)Online publication date: 18-May-2020
  • (2018)Vulnerability-aware Energy Optimization for Reconfigurable Caches in Multitasking SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.2834410(1-1)Online publication date: 2018
  • (2017)Soft error-aware architectural exploration for designing reliability adaptive cache hierarchies in multi-coresProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130388(37-42)Online publication date: 27-Mar-2017
  • (2017)Soft error-aware architectural exploration for designing reliability adaptive cache hierarchies in multi-coresDesign, Automation & Test in Europe Conference & Exhibition (DATE), 201710.23919/DATE.2017.7926955(37-42)Online publication date: Mar-2017
  • (2017)MeRLiNACM SIGARCH Computer Architecture News10.1145/3140659.308022545:2(241-254)Online publication date: 24-Jun-2017
  • (2017)MeRLiNProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080225(241-254)Online publication date: 24-Jun-2017
  • (2017)On design of cache with efficient soft error protection2017 IEEE 37th International Conference on Electronics and Nanotechnology (ELNANO)10.1109/ELNANO.2017.7939719(57-60)Online publication date: Apr-2017
  • (2016)Accurate Model for Application Failure Due to Transient Faults in CachesIEEE Transactions on Computers10.1109/TC.2015.248864265:8(2397-2410)Online publication date: 1-Aug-2016
  • (2016)Multi-level cache vulnerability estimation: The first step to protect memory2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC.2016.7844399(001165-001170)Online publication date: Oct-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media