[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/MICRO.2006.19acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Published: 09 December 2006 Publication History

Abstract

Growing on-chip wire delays will cause many future microarchitectures to be distributed, in which hardware resources within a single processor become nodes on one or more switched micronetworks. Since large processor cores will require multiple clock cycles to traverse, control must be distributed, not centralized. This paper describes the control protocols in the TRIPS processor, a distributed, tiled microarchitecture that supports dynamic execution. It details each of the five types of reused tiles that compose the processor, the control and data networks that connect them, and the distributed microarchitectural protocols that implement instruction fetch, execution, flush, and commit. We also describe the physical design issues that arose when implementing the microarchitecture in a 170M transistor, 130nm ASIC prototype chip composed of two 16-wide issue distributed processor cores and a distributed 1MB nonuniform (NUCA) on-chip memory system.

References

[1]
{1} Arvind and R. S. Nikhil. Executing a program on the MIT Tagged-Token Dataflow Architecture. IEEE Transactions on Computers, 39(3):300-318, 1990.
[2]
{2} M. Budiu, G. Venkataramani, T. Chelcea, and S. C. Goldstein. Spatial computation. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 14- 26, October 2004.
[3]
{3} D. Burger, S. Keckler, K. McKinley, M. Dahlin, L. John, C. Lin, C. Moore, J. Burrill, R. McDonald, and W. Yoder. Scaling to the end of silicon with EDGE architectures. IEEE Computer, 37(7):44-55, July 2004.
[4]
{4} A. Cristal, O. J. Santana, M. Valero, and J. F.Martinez. Toward kiloinstruction processors. ACM Transactions on Architecture and Code Optimization, 1(4):389-417, December 2004.
[5]
{5} D. E. Culler, A. Sah, K. E. Schauser, T. von Eicken, and J. Wawrzynek. Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 164-175, April 1991.
[6]
{6} J. Dennis and D. Misunas. A preliminary architecture for a basic data-flow processor. In International Symposium on Computer Architecture , pages 126-132, January 1975.
[7]
{7} B. Fields, S. Rubin, and R. Bodik. Focusing processor policies via critical-path prediction. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 74-85, July 2001.
[8]
{8} E. Hao, P. Chang, M. Evers, and Y. Patt. Increasing the instruction fetch rate via block-structured instruction set architectures. In International Symposium on Microarchitecture, pages 191-200, December 1996.
[9]
{9} R. Iannucci. Toward a dataflow/von Neumann hybrid architecture. In International Symposium on Computer Architecture, pages 131-140, May 1988.
[10]
{10} R. Kessler. The Alpha 21264 microprocessor. IEEEMicro, 19(2):24- 36, March/April 1999.
[11]
{11} C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 211-222, October 2002.
[12]
{12} P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32- way multithreaded Sparc processor. IEEE Micro, 25(2):21-29, March/April 2005.
[13]
{13} S. Mahlke, D. Lin, W. Chen, R. Hank, and R. Bringmann. Effective compiler support for predicated execution using the hyperblock. In International Symposium on Microarchitecture, pages 45-54, June 1992.
[14]
{14} K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz. Smart memories: A modular reconfigurable architecture. In International Symposium on Computer Architecture, pages 161-171, June 2000.
[15]
{15} R. Nagarajan, K. Sankaralingam, D. Burger, and S.W. Keckler. A design space evaluation of grid processor architectures. In International Symposium on Microarchitecture, pages 40-51, December 2001.
[16]
{16} D. Pham, S. Asano, M. Bolliger, M. Day, H. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y.Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa. The design and implementation of a first-generation CELL processor. In International Solid-State Circuits Conference, pages 184-185, February 2005.
[17]
{17} S. Sethumadhavan, R. McDonald, R. Desikan, D. Burger, and S. W. Keckler. Design and implementation of the TRIPS primary memory system. In International Conference on Computer Design, October 2006.
[18]
{18} T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In International Conference on Parallel Architectures and Compilation Technique, pages 3-14, September 2001.
[19]
{19} A. Smith, J. Burrill, J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, and K. S. McKinley. Compiling for EDGE architectures. In International Symposium on Code Generation and Optimization, pages 185-195, March 2006.
[20]
{20} S. Srinivasan, R. Rajwar, H. Akkary, A. Ghandi, and M. Upson. Continual flow pipelines. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 107-119, October 2004.
[21]
{21} S. Swanson, K. Michaelson, A. Schwerin, and M. Oskin. Wavescalar. In 36th International Symposium on Microarchitecture, pages 291- 302, December 2003.
[22]
{22} M. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal. Scalar operand networks: On-chip interconnect for ILP in partitioned architectures. In International Symposium on High Performance Computer Architecture , pages 341-353, February 2003.
[23]
{23} E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: RAW machines. IEEE Computer , 30(9):86-93, September 1997.

Cited By

View all
  • (2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
  • (2020)CSMO-DSEACM Journal on Emerging Technologies in Computing Systems10.1145/337140616:2(1-22)Online publication date: 30-Jan-2020
  • (2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
December 2006
493 pages
ISBN:0769527329

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 09 December 2006

Check for updates

Qualifiers

  • Article

Conference

Micro-39
Sponsor:

Acceptance Rates

MICRO 39 Paper Acceptance Rate 42 of 174 submissions, 24%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
  • (2020)CSMO-DSEACM Journal on Emerging Technologies in Computing Systems10.1145/337140616:2(1-22)Online publication date: 30-Jan-2020
  • (2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018
  • (2017)Performance Scalability of Adaptive Processor ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/300790210:2(1-22)Online publication date: 11-Apr-2017
  • (2016)CHAINSAWThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195698(1-14)Online publication date: 15-Oct-2016
  • (2016)CASHACM SIGARCH Computer Architecture News10.1145/3007787.300120944:3(682-694)Online publication date: 18-Jun-2016
  • (2016)TransMapIEEE Transactions on Computers10.1109/TC.2016.252598165:11(3456-3469)Online publication date: 1-Nov-2016
  • (2016)CASHProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.65(682-694)Online publication date: 18-Jun-2016
  • (2016)A survey of routing algorithm for mesh Network-on-ChipFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-5431-810:4(591-601)Online publication date: 1-Aug-2016
  • (2015)Exploring the potential of heterogeneous von neumann/dataflow execution modelsACM SIGARCH Computer Architecture News10.1145/2872887.275038043:3S(298-310)Online publication date: 13-Jun-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media