More Web Proxy on the site http://driver.im/

Article

Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Authors:

Dean M. TullsenAuthors Info & Claims

ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture

Pages 408 - 419

https://doi.org/10.1109/ISCA.2005.34

Published: 01 May 2005 Publication History

Abstract

This paper examines the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor, attempting to present a comprehensive view of a class of interconnect architectures. It shows that the design choices for the interconnect have significant effect on the rest of the chip, potentially consuming a significant fraction of the real estate and power budget. This research shows that designs that treat interconnect as an entity that can be independently architected and optimized would not arrive at the best multi-core design. Several examples are presented showing the need for careful co-design. For instance, increasing interconnect bandwidth requires area that then constrains the number of cores or cache sizes, and does not necessarily increase performance. Also, shared level-2 caches become significantly less attractive when the overhead of the resulting crossbar is accounted for. A hierarchical bus structure is examined which negates some of the performance costs of the assumed base-line architecture.

References

[1]

{1} International Technology Roadmap for Semiconductors 2003, http://public.itrs.net.

[2]

{2} Butterfly parallel processor overview. In BBN Report No 6148, Mar. 1986.

[3]

{3} A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, June 1993.

Digital Library

[4]

{4} J. Archibald and J.-L. Baer. Cache coherence protocols: evaluation using a multiprocessor simulation model. ACM Trans. Comput. Syst., 4(4):273-298, 1986.

Digital Library

[5]

{5} L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In ISCA-27, 2000.

Digital Library

[6]

{6} J. Clabes, J. Friedrich, M. Sweet, J. DiLullo, S. Chu, D. Plass, J. Dawson, P. Muench, L. Powell, M. Floyd, B. Sinharoy, M. Lee, M. Goulet, J. Wagoner, N. Schwartz, S. Runyon, G. Gorman, P. Restle, R. Kalla, J. McGill, and S. Dodson. Design and implementation of the power5 microprocessor. In ISSCC, 2004.

Digital Library

[7]

{7} W. J. Dally and B. Towles. Route packets, not wires: On-chip interconnection networks. In DAC-38, pages 684-689, 2001.

Digital Library

[8]

{8} M. Dubois, C. Scheurich, and F. Briggs. Synchronization, coherence, and event ordering in multiprocessors. IEEE Computer, 21(2), 1988.

Digital Library

[9]

{9} R. J. Eickemeyer, R. E. Johnson, S. R. Kunkel, M. S. Squillante, and S. Liu. Evaluation of multithreaded uniprocessors for commercial application environments. In ISCA-23, 1996.

Digital Library

[10]

{10} S. J. Frank. Tightly coupled multiprocessor systems speed memory access times. In Electron, Jan. 1984.

[11]

{11} D. Gajski, D. Kuck, D. Lawrie, and A. Sameh. Cedar - a large scale multiprocessor. In ICPP, Aug. 1983.

[12]

{12} L. Hammond, B. A. Nayfeh, and K. Olukotun. A single-chip multiprocessor. IEEE Computer, 30(9), 1997.

Digital Library

[13]

{13} A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. Oberg, M. Millberg, and D. Lindqvist. Network on chip: An architecture for billion transistor era. In IEEE NorChip Conference, Nov. 2000.

[14]

{14} M. Horowitz, R. Ho, and K. Mai. The future of wires. 1999.

[15]

{15} IBM. Power4:http://www.research.ibm.com/power4.

[16]

{16} IBM. Power5: Presentation at microprocessor forum. 2003.

[17]

{17} C. Kaanta, W. Cote, J. Cronin, K. Holland, P. Lee, and T. Wright. Submicron wiring technology with tungsten and planarization. In Fifth VLSI Multilevel Interconnection Conference, 1988.

[18]

{18} C. Kim, D. Burger, and S. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In ASPLOS , 2002.

Digital Library

[19]

{19} R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-core Architectures: The Potential for Processor Power Reduction. In MICRO-36, Dec. 2003.

Digital Library

[20]

{20} R. Kumar, V. Zyuban, and D. Tullsen. Exploring interconnections in multi-core architectures. Technical report, University of California, San Diego, 2005.

[21]

{21} S. Kunkel, R. Eickemeyer, M. Lipasti, T. Mullins, B. Krafka, H. Rosenberg, S. VanderWiel, P. Vitale, and L. Whitley. A performance methodology for commercial servers. In IBM Journal of R&D, Nov. 2000.

Digital Library

[22]

{22} D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Henessy, M. Horowitz, and M. Lam. The stanford DASH multiprocessor. In IEEE Computer, 1992.

Digital Library

[23]

{23} T. Lovett and S. Thakkar. The symmetry multiprocessor system. In ICPP, Aug. 1988.

[24]

{24} M. Papamarcos and J. Patel. A low overhead coherence solution for multiprocessors with private cache memories. In ISCA-15, 1988.

Digital Library

[25]

{25} L.-S. Peh. Flow control and microarchitectural mechanisms for extending the performance of interconnection networks. PhD Thesis, Stanford University, 2001.

Digital Library

[26]

{26} G. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, V. A. Norton, and J. Weiss. The IBM Research Parallel Processor prototype (RP3): Introduction and Architecture. In ICPP, Aug. 1985.

[27]

{27} C. L. Seitz. The cosmic cube. In Communications of ACM, 1985.

Digital Library

[28]

{28} P. Shivakumar and N. Jouppi. CACTI 3.0: An integrated cache timing, power and area model. In Technical Report 2001/2, Compaq Computer Corporation, Aug. 2001.

[29]

{29} T. N. Theis. The future of interconnection technology. In IBM Journal of R&D, May 2000.

Digital Library

[30]

{30} J. Warnock, J. Keaty, J. Petrovick, J. Clabes, C. Kircher, B. Krauter, P. Restle, B. Zoric, and C. Anderson. The circuit and physical design of the Power4 microprocessor. In IBM Journal of R&D, Jan. 2002.

Digital Library

[31]

{31} A. Wilson. Hierarchical cache/bus architecture for shared memory multiprocessors. In ISCA-14, June 1987.

Digital Library

Cited By

Reza M(2023)Machine Learning Enabled Solutions for Design and Optimization Challenges in Networks-on-Chip based Multi/Many-Core ArchitecturesACM Journal on Emerging Technologies in Computing Systems10.1145/359147019:3(1-26)Online publication date: 30-Jun-2023
https://dl.acm.org/doi/10.1145/3591470
Li CDong DLiao X(2022)MUA-Router: Maximizing the Utility-of-Allocation for On-chip Pipelining RoutersACM Transactions on Architecture and Code Optimization10.1145/351902719:3(1-23)Online publication date: 4-May-2022
https://dl.acm.org/doi/10.1145/3519027
Min DChung YByun IKim JKim JFalsafi BFerdman MLu SWenisch T(2022)CryoWire: wire-driven microarchitecture designs for cryogenic computingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507749(903-917)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507749
Show More Cited By

Index Terms

Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures
2. Hardware

Recommendations

Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
ISCA 2005

This paper examines the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor, attempting to present a comprehensive view of a class of interconnect architectures. It shows that the design choices for the ...
Scaling high-performance interconnect architectures to many-core systems
Proximity-aware directory-based coherence for multi-core processor architectures
SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue for multi-core performance. This is exacerbated by the fact that interconnection speeds are not scaling well with technology. This paper describes ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture

June 2005

541 pages

ISBN:076952270X

ACM SIGARCH Computer Architecture News Volume 33, Issue 2
ISCA 2005
May 2005
531 pages
ISSN:0163-5964
DOI:10.1145/1080695
Issue’s Table of Contents

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2005

Check for updates

Qualifiers

Article

Conference

ISCA05

Sponsor:

SIGARCH

ISCA05: The 32nd Annual International Symposium on Computer Architecture 2005

June 4 - 8, 2005

Acceptance Rates

ISCA '05 Paper Acceptance Rate 45 of 194 submissions, 23%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

206
Total Citations
View Citations
54
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Reza M(2023)Machine Learning Enabled Solutions for Design and Optimization Challenges in Networks-on-Chip based Multi/Many-Core ArchitecturesACM Journal on Emerging Technologies in Computing Systems10.1145/359147019:3(1-26)Online publication date: 30-Jun-2023
https://dl.acm.org/doi/10.1145/3591470
Li CDong DLiao X(2022)MUA-Router: Maximizing the Utility-of-Allocation for On-chip Pipelining RoutersACM Transactions on Architecture and Code Optimization10.1145/351902719:3(1-23)Online publication date: 4-May-2022
https://dl.acm.org/doi/10.1145/3519027
Min DChung YByun IKim JKim JFalsafi BFerdman MLu SWenisch T(2022)CryoWire: wire-driven microarchitecture designs for cryogenic computingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507749(903-917)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507749
Zhou WOuyang YLu YLiang H(2022)A router architecture with dual input and dual output channels for Networks-on-ChipMicroprocessors & Microsystems10.1016/j.micpro.2022.10446490:COnline publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1016/j.micpro.2022.104464
Jain TSchneider KJain A(2017)An Efficient Self-Routing and Non-Blocking Interconnection Network on ChipProceedings of the 10th International Workshop on Network on Chip Architectures10.1145/3139540.3139546(1-6)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3139540.3139546
Hou YHe HYang XGuo DWang XFu JQiu K(2016)FuMicroVLSI Design10.1155/2016/87879192016(2)Online publication date: 1-Dec-2016
https://dl.acm.org/doi/10.1155/2016/8787919
Abellán JChen CJoshi A(2016)Electro-Photonic NoC Designs for Kilocore SystemsACM Journal on Emerging Technologies in Computing Systems10.1145/296761413:2(1-25)Online publication date: 3-Nov-2016
https://dl.acm.org/doi/10.1145/2967614
Charousset DHiesgen RSchmidt T(2016)Revisiting actor programming in C++Computer Languages, Systems and Structures10.1016/j.cl.2016.01.00245:C(105-131)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1016/j.cl.2016.01.002
Xu YZhao BZhang YYang J(2015)Simple Virtual Channel Allocation for High-Throughput and High-Frequency On-Chip RoutersACM Transactions on Parallel Computing10.1145/27423492:1(1-23)Online publication date: 21-May-2015
https://dl.acm.org/doi/10.1145/2742349
Zhang TMeng JCoskun A(2015)Dynamic Cache Pooling in 3D Multicore ProcessorsACM Journal on Emerging Technologies in Computing Systems10.1145/270024712:2(1-21)Online publication date: 2-Sep-2015
https://dl.acm.org/doi/10.1145/2700247
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents