[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

CLEAR: : Cache Lines Error Accumulation Reduction by exploiting invisible accesses

Published: 01 August 2019 Publication History

Abstract

SRAM caches are the most vulnerable processor component to radiation-induced soft errors. Error-Correcting Codes (ECCs) are the conventional scheme to protect caches against soft errors. These errors in the shape of Single-Event Upset (SEU), in which single bit or multiple adjacent bits are affected, can be correctable by ECCs. However, conventional ECCs are unable to correct temporal MBUs (Multiple-Bit Upset), in which two or more SEUs are accumulated in a data word over time. This paper proposes CLEAR (Cache Lines Error Accumulation Reduction) scheme to reduce the occurrence probability of temporal MBUs. By exploiting inherently available externally invisible accesses to all cache ways in a read request, CLEAR conducts more frequent ECC checking for each data word. Therefore, the ECC checking intervals are shortened, which leads to a significant reduction in occurrence probability of temporal MBUs. The evaluations show that CLEAR increases the Mean Time To Failure (MTTF) of the cache by more than 4× compared to the conventional cache architecture with negligible overheads.

References

[1]
J.A. Martinez, J.A. Maestro, P. Reviriego, Evaluating the impact of the instruction set on microprocessor reliability to soft errors, IEEE Trans. Device Mater. Reliab. 18 (1) (2018) 70–79.
[2]
H. Liu, M. Cotter, S. Datta, V. Narayanan, Soft-error performance evaluation on emerging low power devices, IEEE Trans. Device Mater. Reliab. 14 (2) (2014) 732–741.
[3]
Y. Ko, R. Jeyapaul, Y. Kim, K. Lee, A. Shrivastava, Guidelines to design parity protected write-back L1 data cache, in: Proceedings of the Design Automation Conference (DAC), 2015, pp. 1–6.
[4]
S. Lee, S. Baeg, P. Reviriego, Memory reliability model for accumulated and clustered soft errors, IEEE Trans. Nucl. Sci. 58 (5) (2011) 2483–2492.
[5]
S. Wang, J. Hu, S.G. Ziavras, On the characterization and optimization of on-chip cache reliability against soft errors, IEEE Trans. Comput. 58 (9) (2009) 1171–1184.
[6]
A. Neale, M. Sachdev, A new SEC-DED error correction code subclass for adjacent MBU tolerance in embedded memory, IEEE Trans. Device Mater. Reliab. 13 (1) (2013) 223–230.
[7]
Y.-P. Fang, A.S. Oates, Thermal neutron-induced soft errors in advanced memory and logic devices, IEEE Trans. Device Mater. Reliab. 14 (1) (2014) 583–586.
[8]
G. Torrens, S.A. Bota, B. Alorda, J. Segura, An experimental approach to accurate alpha-SER modeling and optimization through design parameters in 6T SRAM cells for deep-nanometer CMOS, IEEE Trans. Device Mater. Reliab. 14 (4) (2014) 1013–1021.
[9]
J. Suh, M. Annavaram, M. Dubois, Macau: a Markov model for reliability evaluations of caches under single-bit and multi-bit upsets, in: Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2012, pp. 1–12.
[10]
D. Rossi, V. Tenentes, S.M. Reddy, B.M. Al-Hashimi, A. Brown, Exploiting aging benefits for the design of reliable drowsy cache memories, IEEE Trans. Comput. Aided Des. Integr Circ. Syst. 37 (7) (2018) 1345–1357.
[11]
M. Gottscho, I. Alam, C. Schoeny, L. Dolecek, P. Gupta, Low-cost memory fault tolerance for iot devices, ACM Trans. Embed. Comput. Syst. 16 (5s) (2017) 128. 1128:25.
[12]
A. Subramaniyan, S. Rehman, M. Shafique, A. Kumar, J. Henkel, Soft error-aware architectural exploration for designing reliability adaptive cache hierarchies in multi-cores, in: Proceedings of the Conference on Design, Automation & Test in Europe (DATE), 2017, pp. 37–42.
[13]
F. Kriebel, S. Rehman, A. Subramaniyan, S.J.B. Ahandagbe, M. Shafique, J. Henkel, Reliability-aware adaptations for shared last-level caches in multi-cores, ACM Trans. Embed. Comput. Syst. 15 (4) (2016) 67. 167:26.
[14]
M. Manoochehri, M. Annavaram, M. Dubois, Extremely low cost error protection with correctable parity protected cache, IEEE Trans. Comput. 63 (10) (2014) 2431–2444.
[15]
M. Gupta, V. Sridharan, D. Roberts, A. Prodromou, A. Venkat, D. Tullsen, R. Gupta, Reliability-aware data placement for heterogeneous memory architecture, in: Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018, pp. 583–595.
[16]
J. Das, S. Ghosh, Energy barrier model of SRAM for improved energy and error rates, IEEE Trans. Circ. Syst.-I (TCAS-I) 61 (8) (2014) 2299–2308.
[17]
F.R. Rosa, R. Brum, G. Wirth, F. Kastensmidt, L. Ost, R. Reis, Impact of dynamic voltage scaling and thermal factors on SRAM reliability, Elsevier J. Microelectron. Reliab. (MR) 55 (9) (2015) 1486–1490.
[18]
H. Wang, Y. Wang, ISS: an iterative scrubbing strategy for improving memory reliability against MBU, in: Proceedings of the International Conference on Human Centered Computing (HCC), 2016, pp. 420–431.
[19]
D. Kim, L. Milor, A methodology for estimating memory lifetime using a system-level accelerated life test and error-correcting codes, in: Proceedings of the IEEE VLSI Test Symposium (VTS), 2017, pp. 1–6.
[20]
Y.-W. Chiu, Y.-H. Hu, M.-H. Tu, J.-K. Zhao, Y.-H. Chu, S.-J. Jou, C.-T. Chuang, 40 nm bit-interleaving 12T subthreshold SRAM with data-aware write-assist, IEEE Trans. Circ. Syst.-I (TCAS-I) 61 (9) (2014) 2578–2585.
[21]
Y. Yang, H. Jeong, S.C. Song, J. Wang, G. Yeap, S.-O. Jung, Single bit-line 7T SRAM cell for near-threshold voltage operation with enhanced performance and energy in 14 nm FinFET technology, IEEE Trans. Circ. Syst.-I (TCAS-I) 63 (7) (2016) 1023–1032.
[22]
S. Pal, A. Islam, 9-T SRAM cell for reliable ultralow-power applications and solving multibit soft-error issue, IEEE Trans. Device Mater. Reliab. 16 (2) (2016) 172–182.
[23]
S. Ganapathy, J. Kalamatianos, K. Kasprak, S. Raasch, On characterizing near-threshold sram failures in finfet technology, in: Proceedings of the 54th Annual Design Automation Conference (DAC), ACM, 2017, p. 53. 153:6.
[24]
J. Tonfat, L. Tambara, A. Santos, F.L. Kastensmidt, Soft error susceptibility analysis methodology of hls designs in sram-based fpgas, Microprocess. Microsyst. 51 (2017) 209–219.
[25]
A.M. Keller, M.J. Wirthlin, Benefits of complementary seu mitigation for the leon3 soft processor on sram-based fpgas, IEEE Trans. Nucl. Sci. 64 (1) (2017) 519–528.
[26]
W. Choi, J. Park, A charge-recycling assist technique for reliable and low power SRAM design, IEEE Trans. Circ. Syst.-I (TCAS-I) 63 (8) (2016) 1164–1175.
[27]
A. Bosser, V. Gupta, G. Tsiligiannis, C. Frost, A. Zadeh, A. Javanainen, H. Puchner, F. Saign, A. Virtanen, F. Wrobeland, L. Dilillo, Methodologies for the statistical analysis of memory response to radiation, IEEE Trans. Nucl. Sci. 63 (4) (2016) 2122–2128.
[28]
A. Neale, M. Sachdev, Neutron radiation induced soft error rates for an adjacent-ECC protected SRAM in 28 nm CMOS, IEEE Trans. Nucl. Sci. 63 (3) (2016) 1912–1917.
[29]
H. Farbeh, F. Mozafari, M. Zabihi, S.G. Miremadi, Raw-tag: replicating in altered cache ways for correcting multiple-bit errors in tag array, IEEE Trans. Dependable Secure Comput. (2017).
[30]
S.G. Ghaemi, I. Ahmadpour, M. Ardebili, H. Farbeh, SMARTag: error correction in cache tag array by exploiting address locality, in: Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 2018, pp. 1658–1663.
[31]
H. Farbeh, H. Kim, S.G. Miremadi, S. Kim, Floating-ecc: dynamic repositioning of error correcting code bits for extending the lifetime of stt-ram caches, IEEE Trans. Comput. 65 (12) (2016) 3661–3675.
[32]
J. Hong, S. Kim, Smart ecc allocation cache utilizing cache data space, IEEE Trans. Comput. 66 (2) (2017) 368–374.
[33]
P. Reviriego, S. Pontarelli, A. Evans, J.A. Maestro, A class of SEC-DED-DAEC codes derived from orthogonal Latin square codes, IEEE Trans. Very Large Scale Integr. Syst. 23 (5) (2015) 968–972.
[34]
H. Farbeh, M. Fazeli, F. Khosravi, S.G. Miremadi, Memory mapped spm: protecting instruction scratchpad memory in embedded systems against soft errors, in: 2012 Ninth European Dependable Computing Conference, 2012, pp. 218–226.
[35]
Z. Azad, H. Farbeh, A.M.H. Monazzah, S.G. Miremadi, Aware: adaptive way allocation for reconfigurable eccs to protect write errors in stt-ram caches, IEEE Trans. Emerg. Top. Comput. (2017).
[36]
H. Farbeh, S.G. Miremadi, PSP-cache: a low-cost fault-tolerant cache memory architecture, in: Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 2014, pp. 1–4.
[37]
L. Delshadtehrani, H. Farbeh, S.G. Miremadi, In-scratchpad memory replication: protecting scratchpad memories in multicore embedded systems against soft errors, ACM Trans. Des. Autom. Electron. Syst. 20 (4) (2015) 61.
[38]
H. Farbeh, N.S. Mirzadeh, N.F. Ghalaty, S.G. Miremadi, M. Fazeli, H. Asadi, A cache-assisted scratchpad memory for multiple-bit-error correction, IEEE Trans. Very Large Scale Integr. Syst. 24 (11) (2016) 1–14.
[39]
S. Cha, H. Yoon, Single-error-correction and double-adjacent-error-correction code for simultaneous testing of data bit and check bit arrays in memories, IEEE Trans. Device Mater. Reliab. 14 (1) (2014) 529–535.
[40]
S.S. Mukherjee, J. Emer, T. Fossum, S.K. Reinhardt, Cache scrubbing in microprocessors: myth or necessity, in: Proceedings of the IEEE Pacific Rim International Symposium on Dependable Computing (PRDC), 2004, pp. 37–42.
[41]
J. Hong, J. Kim, S. Kim, Exploiting same tag bits to improve the reliability of the cache memories, IEEE Trans. Very Large Scale Integr. Syst. 23 (2) (2015) 254–265.
[42]
N. Binkert, B. Beckmann, G. Black, S.K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D.R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M.D. Hill, D.A. Wood, The gem5 simulator, Comput. Architect. News 39 (2) (2011) 1–7.
[43]
J.L. Henning, SPEC CPU2006 benchmark descriptions, Comput. Architect. News 34 (4) (2006) 1–17.
[44]
L.D. Hung, M. Goshima, S. Sakai, Mitigating soft errors in highly associative cache with CAM-based tag, in: Proceedings on the IEEE International Conference on Computer Design (ICCD), 2005, pp. 342–347.
[45]
J. Dai, L. Wang, An energy-efficient L2 cache architecture using way tag information under write-through policy, IEEE Trans. Very Large Scale Integr. Syst. 21 (1) (2013) 102–112.
[46]
E. Cheshmikhani, H. Farbeh, H. Asadi, Enhancing reliability of STT-MRAM caches by eliminating read disturbance accumulation, in: Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 2019, pp. 854–859.
[47]
S.S. Mukherjee, J. Emer, S.K. Reinhardt, The soft error problem: an architectural perspective, in: Proceedings of the IEEE International Symposium on High-Performance Comp Architecture (HPCA), 2005, pp. 1–12.
[48]
M. Gottscho, M. Shoaib, S. Govindan, B. Sharma, D. Wang, P. Gupta, Measuring the impact of memory errors on application performance, IEEE Comput. Archit. Lett. 16 (1) (2017) 51–55.
[49]
N. Muralimanohar, R. Balasubramonian, N. Jouppi, Tech. Rep. HPL-2009-85 CACTI 6.0: A Tool to Model Large Caches, 2009.

Cited By

View all
  • (2021)Cache Tag Array Fault Tolerance Method Based on Redundancy and Similarity of Adjacent Cache Line Tag BitsInternational Conference on Frontiers of Electronics, Information and Computation Technologies10.1145/3474198.3478212(1-8)Online publication date: 21-May-2021

Index Terms

  1. CLEAR: Cache Lines Error Accumulation Reduction by exploiting invisible accesses
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Microelectronics Journal
          Microelectronics Journal  Volume 90, Issue C
          Aug 2019
          353 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 01 August 2019

          Author Tags

          1. Error correction
          2. Multiple-Bit Upset (MBU)
          3. Single-Event Upset (SEU)
          4. Soft error accumulation
          5. Temporal MBU

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 27 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2021)Cache Tag Array Fault Tolerance Method Based on Redundancy and Similarity of Adjacent Cache Line Tag BitsInternational Conference on Frontiers of Electronics, Information and Computation Technologies10.1145/3474198.3478212(1-8)Online publication date: 21-May-2021

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media