[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Public Access

Electro-Photonic NoC Designs for Kilocore Systems

Published: 03 November 2016 Publication History

Abstract

The increasing core count in manycore systems requires a corresponding large Network-on-chip (NoC) bandwidth to support the overlying applications. However, it is not possible to provide this large bandwidth in an energy-efficient manner using electrical link technology. To overcome this issue, photonic link technology has been proposed as a replacement. This work explores the limits and opportunities for using photonic links to design the NoC architecture for a future Kilocore system. Three different NoC designs are explored: ElecNoC, an electrical concentrated two-dimensional- (2D) mesh NoC; HybNoC, an electrical concentrated 2D mesh with a photonic multi-crossbar NoC; and PhotoNoC, a photonic multi-bus NoC. We consider both private and shared cache architectures and, to leverage the large bandwidth density of photonic links, we investigate the use of prefetching and aggressive non-blocking caches. Our analysis using contemporary Big Data workloads shows that the non-blocking caches with a shared LLC can best leverage the large bandwidth of the photonic links in the Kilocore system. Moreover, compared to ElecNoC-based and HybNoC-based Kilocore systems, a PhotoNoC-based Kilocore system achieves up to 2.5× and 1.5× better performance, respectively, and can support up to 2.1× and 1.1× higher bandwidth, respectively, while dissipating comparable power in the overall system.

References

[1]
N. Abeyratne et al. 2013. Scaling towards kilo-core processors with asymmetric high-radix topologies. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA2013). 496--507.
[2]
T. W. Ainsworth and T. M. Pinkston. 2007. Characterizing the cell EIB on-chip network. IEEE Micro 27, 5 (Sep. 2007), 6--14.
[3]
Shirish Bahirat and Sudeep Pasricha. 2014. METEOR: Hybrid photonic ring-mesh network-on-chip for multicore architectures. ACM Trans. Embed. Comput. Syst. 13, 3s (2014), 116:1--116:33.
[4]
D. Bailey and others. 1994. The NAS Parallel Benchmarks. Technical Report RNR-94-007.
[5]
Scott Beamer et al. 2010. Re-architecting DRAM memory systems with monolithically integrated silicon photonics. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA 2010). 129--140.
[6]
S. Bell et al. 2008. TILE64 - processor: A 64-core SoC with mesh interconnect. In Proceedings of the 2008 IEEE International Solid-State Circuits Conference (ISSCC’08), Digest of Technical Papers. 88--598.
[7]
Keren Bergman et al. 2014. Silicon photonics. In Photonic Network-on-Chip Design. Integrated Circuits and Systems, Vol. 68. Springer, New York, 27--78.
[8]
C. Bienia et al. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceeings of the PACT. 72--81.
[9]
A. Boos et al. 2013. PROTON: An automatic place-and-route tool for optical networks-on-chip. In Proceedings of the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 138--145.
[10]
D. Campbell et al. 2012. Ubiquitous High Performance Computing: Challenge Problems Specification. Technical Report HR0011-10-C-0145. Georgia Institute of Technology.
[11]
Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. 52:1--52:12.
[12]
Chao Chen et al. 2014. Sharing and placement of on-chip laser sources in silicon-photonic NoCs. In Proceedings of the 2014 8th IEEE/ACM International Symposium on Networks-on-Chip (NoCS). 88--95.
[13]
Chao Chen, J. L. Abellan, and A. Joshi. 2015. Managing laser power in silicon-photonic NoC through cache and NoC reconfiguration. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 34 (2015), 972--985.
[14]
Chao Chen and A. Joshi. 2013. Runtime management of laser power in silicon-photonic multibus NoC architecture. IEEE J. Select. Top. Quant. Electron. 19, 2 (2013).
[15]
Mark J. Cianchetti, Joseph C. Kerekes, and David H. Albonesi. 2009. Phastlane: A rapid transit optical routing network. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA 2009). 441--450.
[16]
D. Vantrease et al. 2008. Corona: System implications of emerging nanophotonic technology. In Proceedings of the 35th International Symposium on Computer Architecture, 2008 (ISCA’08). 153--164.
[17]
B. K. Daya et al. 2014. SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering. In Proceedings of the 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). 25--36.
[18]
EZchip Semiconductor Ltd. 2015. EZchip Introduces TILE-Mx100 Worlds Highest Core-Count ARM Processor Optimized for High-Performance Networking Applications. Retrieved from http://www.tilera.com/News/PressRelease/?ezchip=97.
[19]
M. Georgas et al. 2014. A monolithically-integrated optical transmitter and receiver in a zero-change 45nm SOI process. In Proceedings of the 2014 Symposium on VLSI Circuits Digest of Technical Papers. 1--2.
[20]
Matthias Gries et al. 2011. SCC: A flexible architecture for many-core platform research. Comput. Sci. Eng. 13, 6 (2011), 79--83.
[21]
Michael A. Heroux et al. 2009. Improving performance via mini-applications. Technical Report. Sandia National Laboratories.
[22]
J. Howard et al. 2011. A 48-core IA-32 processor in 45 nm CMOS using on-die message-passing and DVFS for performance and power scaling. IEEE J. Solid-State Circ. 46, 1 (2011), 173--183.
[23]
A. Joshi et al. 2009a. Silicon-photonic clos networks for global on-chip communication. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip, 2009 (NoCS’09). 124--133.
[24]
A. Joshi, Byungsub Kim, and V. Stojanovic. 2009b. Designing energy-efficient low-diameter on-chip networks with equalized interconnects. In Proceedings of the 17th IEEE Symposium on High Performance Interconnects (HOTI’09). 3--12.
[25]
John Kim, James Balfour, and William Dally. 2007. Flattened butterfly topology for on-chip networks. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. 172--182.
[26]
N. Kirman et al. 2006. Leveraging optical technology in future bus-based chip multiprocessors. In 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39). 492--503.
[27]
Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun. 2005. Niagara: A 32-way multithreaded sparc processor. IEEE Micro 25, 2 (2005), 21--29.
[28]
David Kroft. 1981. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA’81). IEEE Computer Society Press, Los Alamitos, CA, 81--87.
[29]
Rakesh Kumar, Victor Zyuban, and Dean M. Tullsen. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. 408--419.
[30]
George Kurian et al. 2010. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10).
[31]
S. Li et al. 2009a. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proc. MICRO-42. 469--480.
[32]
Zheng Li et al. 2009b. Spectrum: A hybrid nanophotonic-electric on-chip network. In Proceedings of the 46th Annual Design Automation Conference (DAC’09). 575--580.
[33]
Zhongqi Li et al. 2015. Aurora: A cross-layer solution for thermally resilient photonic network-on-chip. IEEE Trans. VLSI Syst. 23, 1 (2015), 170--183.
[34]
Xiaoyao Liang, K. Turgay, and D. Brooks. 2007. Architectural power models for sram and cam structures based on hybrid analytical/empirical techniques. In International Conference on Computer Aided Design (ICCAD). 824--830.
[35]
F. Y. Liu et al. 2012. 10-Gbps, 5.3-mW optical transmitter and receiver circuits in 40-nm CMOS. IEEE J. Solid-State Circ. 47, 9 (Sept 2012), 2049--2067.
[36]
Yangyang Liu et al. 2014. Ultra-low-loss CMOS-compatible waveguide crossing arrays based on multimode bloch waves and imaginary coupling. Opt. Lett. 39, 2 (Jan 2014), 335--338.
[37]
D. Ludovici et al. 2009. Assessing fat-tree topologies for regular network-on-chip design under nanoscale technology constraints. In Proceedings of the Conference on Design, Automation and Test in Europe. 562--565.
[38]
Andrew McAfee et al. 2012. Big data: The management revolution. Harv. Bus. Rev. 90, 10 (2012), 61--67.
[39]
Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. 2010. Introducing the graph 500. Cray Users Group (CUG) (2010).
[40]
Brian Neel, Matthew Kennedy, and Avinash Kodi. 2015. Dynamic power reduction techniques in on-chip photonic interconnects. In Proceedings of the 25th Edition on Great Lakes Symposium on VLSI. 249--252.
[41]
Kyle J. Nesbit and James E. Smith. 2005. Data cache prefetching using a global history buffer. IEEE Micro 25, 1 (Jan. 2005), 90--97.
[42]
Jason S. Orcutt et al. 2012. Open foundry platform for high-performance electronic-photonic integration. Opt. Expr. 20, 11 (May 2012), 12222--12232.
[43]
Yan Pan et al. 2009. Firefly: Illuminating future network-on-chip with nanophotonics. SIGARCH Comput. Arch. News 37, 3 (June 2009).
[44]
Yan Pan, John Kim, and Gokhan Memik. 2011. FeatherWeight: Low-cost optical arbitration with QoS support. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, 105--116.
[45]
David A. Patterson and John L. Hennessy. 2013. Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (5th ed.). Morgan Kaufmann, San Francisco, CA.
[46]
Carl Ramey. 2011. Tile-gx100 manycore processor: Acceleration interfaces and architecture. In Proceedings of the 23th Hot Chips Symposium.
[47]
Gunther Roelkens et al. 2014. Silicon-based photonic integration beyond the telecommunication wavelength range. IEEE J. Select. Top. Quant. Electron. 20, 4 (2014), 394--404.
[48]
Steve Scott et al. 2006. The BlackWidow high-radix clos network. In Proceedings of the 33rd Annual International Symposium on Computer Architecture. 16--28.
[49]
Larry Seiler et al. 2008. Larrabee: A many-core x86 architecture for visual computing. In ACM SIGGRAPH 2008 Papers. 18:1--18:15.
[50]
A. Shacham et al. 2007b. On the design of a photonic network-on-chip. In NOCS. 53--64.
[51]
Assaf Shacham, Keren Bergman, and Luca P. Carloni. 2007a. The case for low-power photonic networks on chip. In Proceedings of the 44th Annual Design Automation Conference. 132--135.
[52]
M. A. I. Sikder et al. 2015. OWN: Optical and wireless network-on-chip for kilo-core architectures. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI). 44--51.
[53]
Dean M. Tullsen and Susan J. Eggers. 1995. Effective cache prefetching on bus-based multiprocessors. ACM Trans. Comput. Syst. 13, 1 (1995), 57--88.
[54]
A. N. Udipi, N. Muralimanohar, and R. Balasubramonian. 2010. Towards scalable, energy-efficient, bus-based on-chip networks. In Proceedings of the 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA). 1--12.
[55]
Hangsheng Wang, Li-Shiuan Peh, and S. Malik. 2003. Power-driven design of router microarchitectures in on-chip networks. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICR0-36). 105--116.
[56]
Lei Wang et al. 2014. Bigdatabench: A big data benchmark suite from internet services. In Proceedings of the 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 488--499.
[57]
S. C. Woo et al. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. ISCA. 24--36.
[58]
Cao Yu et al. 2012. The Predictive Technology Model (PTM) website. Retrieved from http://ptm.asu.edu/.
[59]
C. Zhang, D. Liang, G. Kurczveil, J. E. Bowers, and R. G. Beausoleil. 2015. Thermal management of hybrid silicon ring lasers for high temperature operation. IEEE Journal of Selected Topics in Quantum Electronics 21, 6 (2015), 385--391.
[60]
Ying Ping Zhang et al. 2006. A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture. In Proceedings 20th International Symposium on Parallel and Distributed Processing (IPDPS’06).
[61]
Xuezhe Zheng et al. 2012. 2-pJ/bit (on-chip) 10-Gb/s digital CMOS silicon photonic link. IEEE Photon. Technol. Lett. 24, 14 (July 2012), 1260--1262.
[62]
Xuezhe Zheng et al. 2013. A 33mW 100Gbps CMOS silicon photonic WDM transmitter using off-chip laser sources. In Optical Fiber Communication Conference. Optical Society of America, PDP5C--9.

Cited By

View all
  • (2024)SGXFault: An Efficient Page Fault Handling Mechanism for SGX EnclavesIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.326816921:3(1173-1178)Online publication date: May-2024
  • (2023)Casual Games, Cognition, and Play across the Lifespan: A Critical SynthesisGames: Research and Practice10.1145/35945341:2(1-25)Online publication date: 30-Jun-2023
  • (2023)Machine Learning Enabled Solutions for Design and Optimization Challenges in Networks-on-Chip based Multi/Many-Core ArchitecturesACM Journal on Emerging Technologies in Computing Systems10.1145/359147019:3(1-26)Online publication date: 30-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 13, Issue 2
Special Issue on Nanoelectronic Circuit and System Design Methods for the Mobile Computing Era and Regular Papers
April 2017
377 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/3014160
  • Editor:
  • Yuan Xie
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 03 November 2016
Accepted: 01 June 2016
Revised: 01 April 2016
Received: 01 October 2015
Published in JETC Volume 13, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Networks-on-chip
  2. manycore CMP
  3. multi-programmed workloads
  4. silicon-photonic technology

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • DARPA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)17
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SGXFault: An Efficient Page Fault Handling Mechanism for SGX EnclavesIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.326816921:3(1173-1178)Online publication date: May-2024
  • (2023)Casual Games, Cognition, and Play across the Lifespan: A Critical SynthesisGames: Research and Practice10.1145/35945341:2(1-25)Online publication date: 30-Jun-2023
  • (2023)Machine Learning Enabled Solutions for Design and Optimization Challenges in Networks-on-Chip based Multi/Many-Core ArchitecturesACM Journal on Emerging Technologies in Computing Systems10.1145/359147019:3(1-26)Online publication date: 30-Jun-2023
  • (2021)Fuzzy-Token: An Adaptive MAC Protocol for Wireless-Enabled Manycores2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9473960(1657-1662)Online publication date: 1-Feb-2021
  • (2021)CuckoOnsai: An Efficient Memory Authentication Using Amalgam of Cuckoo Filters and Integrity Trees2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586205(1273-1278)Online publication date: 5-Dec-2021
  • (2021)FreqCounter: Efficient cacheability of encryption and integrity tree counters in secure processorsJournal of Systems Architecture10.1016/j.sysarc.2021.102252(102252)Online publication date: Jul-2021
  • (2020)POPSTARProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408683(1456-1461)Online publication date: 9-Mar-2020
  • (2020)POPSTAR: a Robust Modular Optical NoC Architecture for Chiplet-based 3D Integrated Systems2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116214(1456-1461)Online publication date: Mar-2020
  • (2020)CAMON: Low-Cost Silicon Photonic Chiplet for Manycore ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.292649539:9(1820-1833)Online publication date: Sep-2020
  • (2019)Initialize once, start fast: application initialization at build timeProceedings of the ACM on Programming Languages10.1145/33606103:OOPSLA(1-29)Online publication date: 10-Oct-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media