[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Addressing network-on-chip router transient errors with inherent information redundancy

Published: 03 July 2013 Publication History

Abstract

We exploit the inherent information redundancy in the control path of Network-on-Chip (NoC) routers to manage transient errors, preventing packet loss and misrouting. Outputs of the routing arbitration units in NoC routers can be used to determine arbitration failures, because the valid arbitration outputs are a subset of all possible values. This feature is exploited to detect and correct logic and register errors in the router arbitration control path. The proposed method is complementary to other error management methods for NoC routers. An analytical reliability model of our method is provided, including parameters such as logic unit size, different error rates for logic gates and registers, and the location of faulty elements. Compared to triple-modular redundancy (TMR), the proposed method improves the arbiter reliability by two orders of magnitude while reducing the total area and power by 43% and 64%, respectively. In the presented case studies, two traffic traces from the PARSEC benchmark suite are used to evaluate the average latency and energy consumption. Simulations performed on a 4× 4 NoC show that our method reduces the average latency by up to 50% and reduces average energy by up to 70% compared to other methods.

References

[1]
Baumann, R. 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans. Device Mater. Reliab. 5, 305--316.
[2]
Benini, L. and De Micheli, G. 2002. Networks on Chips: A new SoC paradigm. Computer, 70--78.
[3]
Bertozzi, D., Benini, L. and De Micheli, G. 2005. Error control scheme for on-chip communication links: the energy-reliability tradeoff. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 24, 6, 818--831.
[4]
Constantinides, K., Plaza, S., Blome, J., Zhang, B., Bertacco, V., Mahlke, S., Austin, T., and Orshansky, M. 2006. BulletProof: A defect-tolerant CMP switch architecture. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA'06). 5--16.
[5]
Dimitrakopoulos, G., Chrysos N., and Galanopoulos K. Fast arbiters for on-chip network switches. In Proceedings of the IEEE International Conference on Computer Design (ICCD'10). 664--670.
[6]
Dutta, A. and Touba, N. A. 2007. Reliable Network-on-Chip using a low cost unequal error protection code. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). 3--11.
[7]
Fick, D., Deorio, A., Chen, G., Bertacco, V., Sylvester, D., and Blaauw, D. 2009a. A highly resilient routing algorithm for fault-tolerant NoCs. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. 21--26.
[8]
Fick, D., Deorio, A., Hu, J., Bertacco, V., Blaauw, D., and Sylvester, D. 2009b. Vicis: A reliable network for unreliable silicon. In Proceedings of the IEEE/ACM Design Automation Conference. 812--817.
[9]
Fu, B. and Ampadu, P. 2009. On Hamming product codes with type-II hybrid ARQ for on-chip interconnects. IEEE Trans. Circuits Syst. Regul. Pap. 56, 9, 2042--2054.
[10]
Kim, J., Nicopoulos, C. and Park, D. 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'06). 4--15.
[11]
Lehtonen, T., Wolpert, D., Lijeberg, P., Plosila, J. and Ampadu, P. 2010. Self-adaptive system for addressing permanent errors in on-chip interconnects. IEEE Trans. VLSI Syst. 18, 4, 527--540.
[12]
Lyons, R. E. and Vanderkulk, W. 1962. The use of triple-modular redundancy to improve computer reliability. IBM J. Res. Dev. 6, 2, 200--209.
[13]
Mahatme, N. N., Chatterjee, I., Bhuva, B. L., Ahlbin, J., Massengill, L. W., and Shuler, R. 2010. Analysis of soft error rates in combinational and sequential logic and implications of hardening for advanced technologies. In Proceedings of the IEEE International Reliability Physics Symposium. 1031--1035.
[14]
Mediratta, S. D. and Draper, J. 2007. Characterization of a fault-tolerant NoC router. In Proceedings of the IEEE International Symposium on Circuits and Systems (IISCAS'07). 381--384.
[15]
Murali, S., Theocharides, T., Vijaykrishnan, N., Irwin, M. J., Benini, L. and De Micheli, G. 2005. Analysis of error recovery schemes for networks on chips. IEEE Des. Test Comput. 22, 5, 434--442.
[16]
Owens, J., Dally, W., Ho, R., Jayasimha, D., Keckler, S. W. and Peh, L.-S. 2007. Research challenges for on-chip interconnection networks. IEEE Micro 27, 5, 96--108.
[17]
Palesi, M., Kumar, S., and Catania, V. 2010. Leveraging Partially Faulty Links Usage for Enhancing Yield and Performance in Networks-on-Chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29, 3, 426--440.
[18]
Parsec Benchmark. http://parsec.cs.princeton.edu.
[19]
Predictive Technology Model {Online}: http://www.eas.asu.edu/∼ptm.
[20]
Ramanujam, R. S., Soteriou, V., Lin, B. and Peh, L.-S. 2010. Design of a high-throughput distributed shared-buffer NoC router. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS'10). 69--78.
[21]
Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho, J., Silla, F. and Duato, J. 2010. Addressing manufacturing challenges with cost-efficient fault tolerant routing. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS'10). 25--32.
[22]
Salminen, E., Kulmala, A. and Hämäläinen, T. D. 2008. Survey of network-on-chip proposals White Paper, OCP-IP, 1--13.
[23]
Sanusi, A. and Bayoumi, M. A. 2009. Smart-flooding: A novel scheme for fault-tolerant NoCs. In Proceedings of the IEEE International SoC Conference. 259--262.
[24]
Shamshiri, S. and Cheng, K.-T. 2009. Yield and cost analysis of a reliable NoC. In Proceedings of the 27th IEEE VLSI Test Symposium. 173--178.
[25]
Sridhara, S. and Shanbhag, N. R. 2005. Coding for system-on-chip networks: a unified framework. IEEE Trans. VLSI Syst. 13, 6, 655--667.
[26]
Vangal, S., Howard, J., et al. 2008. An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circuits 43, 1, 29--41.
[27]
Yanamandra, A., Eachempati, S., Soundararajan, N., Narayanan, V., Irwin, M. J., and Krishnan, R. 2010. Optimizing power and performance for reliable on-chip networks. In Proceedings of the Asia and South Pacific Design Automation Conference. 431--436.
[28]
Yu, Q. and Ampadu, P. 2010. Transient and permanent error co-management for reliable network-on-chip. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS'10). 145--154.
[29]
Yu, Q., Zhang, B., Li, Y., and Ampadu, P. 2010. Error control integration scheme for reliable NoC. In Proceedings of the IEEE International Symposium on Circuits and Systems (IISCAS'10). 3893--3896.

Cited By

View all
  • (2020)TSV-OCT: A Scalable Online Multiple-TSV Defects Localization for Real-Time 3-D-IC SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.294887828:3(672-685)Online publication date: Mar-2020
  • (2018)Investigating Reliability and Security of Integrated Circuits and Systems2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2018.00029(106-111)Online publication date: Jul-2018
  • (2017)A Comprehensive Reliability Assessment of Fault-Resilient Network-on-Chip Using Analytical ModelIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.273600425:11(3099-3112)Online publication date: Nov-2017
  • Show More Cited By

Index Terms

  1. Addressing network-on-chip router transient errors with inherent information redundancy

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 12, Issue 4
          Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
          June 2013
          288 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/2485984
          Issue’s Table of Contents
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Journal Family

          Publication History

          Published: 03 July 2013
          Accepted: 01 September 2011
          Revised: 01 July 2011
          Received: 01 March 2011
          Published in TECS Volume 12, Issue 4

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Networks-on-chip
          2. arbiter
          3. error control coding
          4. information redundancy
          5. on-chip interconnect
          6. router
          7. transient error
          8. triple-modular redundancy

          Qualifiers

          • Research-article
          • Research
          • Refereed

          Funding Sources

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)2
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 19 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2020)TSV-OCT: A Scalable Online Multiple-TSV Defects Localization for Real-Time 3-D-IC SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.294887828:3(672-685)Online publication date: Mar-2020
          • (2018)Investigating Reliability and Security of Integrated Circuits and Systems2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2018.00029(106-111)Online publication date: Jul-2018
          • (2017)A Comprehensive Reliability Assessment of Fault-Resilient Network-on-Chip Using Analytical ModelIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.273600425:11(3099-3112)Online publication date: Nov-2017
          • (2017)A hardened network-on-chip design using runtime hardware Trojan mitigation methodsIntegration, the VLSI Journal10.1016/j.vlsi.2016.06.00856:C(15-31)Online publication date: 1-Jan-2017
          • (2017)A low-overhead soft---hard fault-tolerant architecture, design and management scheme for reliable high-performance many-core 3D-NoC systemsThe Journal of Supercomputing10.1007/s11227-016-1951-073:6(2705-2729)Online publication date: 1-Jun-2017
          • (2016)Reliability Assessment and Quantitative Evaluation of Soft-Error Resilient 3D Network-on-Chip Systems2016 IEEE 25th Asian Test Symposium (ATS)10.1109/ATS.2016.37(161-166)Online publication date: Nov-2016

          View Options

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media