More Web Proxy on the site http://driver.im/

article

Continual flow pipelines

Authors:

Srikanth T. Srinivasan,

Haitham Akkary,

Mike UptonAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 38, Issue 5

Pages 107 - 119

https://doi.org/10.1145/1037949.1024407

Published: 07 October 2004 Publication History

Abstract

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides high single-thread performance and enables multiple of these to be placed on the same die for high throughput while dynamically adapting for future applications? Conventional approaches for high single-thread performance rely on large and complex cores to sustain a large instruction window for memory tolerance, making them unsuitable for multi-core chips. We present Continual Flow Pipelines (CFP) as a new non-blocking processor pipeline architecture that achieves the performance of a large instruction window without requiring cycle-critical structures such as the scheduler and register file to be large. We show that to achieve benefits of a large instruction window, inefficiencies in management of both the scheduler and register file must be addressed, and we propose a unified solution. The non-blocking property of CFP keeps key processor structures affecting cycle time and power (scheduler, register file), and die size (second level cache) small. The memory latency-tolerant CFP core allows multiple cores on a single die while outperforming current processor cores for single-thread applications.

References

[1]

H. Akkary and M. A. Driscoll. A Dynamic Multithreading Processor. In Proceedings of the 31st International Symposium on Microarchitecture, November 1998.

Digital Library

[2]

H. Akkary, R. Rajwar, and S. T. Srinivasan. Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors. In Proceedings of the 36th International Symposium on Microarchitecture, December 2003.

Digital Library

[3]

R. Balasubramonian, S. Dwarkadas, and D. Albonesi. Reducing the complexity of the register file in dynamic superscalar processors. In Proceedings of the 34th International Symposium on Microarchitecture, December 2001, pp. 237--249.

Digital Library

[4]

R. Balasubramonian, S. Dwarkadas, and D. H. Albonesi. Dynamically allocating processor resources between nearby and distant ILP. In Proceedings of the 28th Annual International Symposium on Computer Architecture, June 2001, pp. 26--37.

Digital Library

[5]

D. Burger, S. Kaxiras, and J. R. Goodman. DataScalar Architectures. In Proceedings of the 24th Annual International Symposium on Computer Architecture, June 1997, pp. 338--349.

Digital Library

[6]

R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous Subordinate Multithreading (SSMT). In Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999.

Digital Library

[7]

G. Z. Chrysos and J. S. Emer. Memory dependence prediction using store sets. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998, pp. 142--153.

Digital Library

[8]

A. Cristal, D. Ortega, J. Llosa, and M. Valero. Out-of-Order Commit Processors. In Proceedings of the Tenth International Symposium on High-Performance Computer Architecture, February 2004, pp. 48--59.

Digital Library

[9]

A. Cristal, M. Valero, J.-L. Llosa, and A. Gonzalez. Large Virtual ROBs by Processor Checkpointing. Technical Report UPC-DAC-2002-39, Universitat Politecnica de Catalunya, July 2002.

[10]

J.-L. Cruz, A. Gonzalez, M. Valero, and N. P. Topham. Multiple-banked register file architectures. In Proceedings of the 28th Annual International Symposium on Computer Architecture, June 200.

Digital Library

[11]

J. Dundas and T. Mudge. Improving data cache performance by pre-executing instructions under a cache miss. In Proceedings of the 1997 International Conference on Supercomputing, 1997, pp. 68--75.

Digital Library

[12]

G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The Microarchitecture of the Pentium 4 Processor. Intel Technology Journal, February 2001.

[13]

T. Karkhanis and J. E. Smith. A Day in the Life of a Data Cache Miss. In Workshop on Memory Performance Issues, June 2002.

[14]

A. R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg. A large, fast instruction window for tolerating cache misses. In Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pp. 59--70.

Digital Library

[15]

J. F. Martinez, J. Renau, M. C. Huang, M. Prvulovic, and J. Torrellas. Cherry: Checkpointed Early Resource Recycling in Out-of-order Microprocessors. In Proceedings of the 35th International Symposium on Microarchitecture, November 2002.

Digital Library

[16]

T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, and V. Vinals. Dynamic Register Renaming Through Virtual-Physical Registers. In Journal of Instruction Level Parallelism, May 2000.

[17]

M. Moudgill, K. Pingali, and S. Vassiliadis. Register Renaming and Dynamic Speculation: an alternative Approach. In Proceedings of the 26th International Symposium on Microarchitecture, December 1993.

Digital Library

[18]

O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture, February 2003.

Digital Library

[19]

A. Roth and G. S. Sohi. Speculative Data-Driven Multi-Threading. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture, January 2001.

Digital Library

[20]

K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proceedings of the 30th Annual International Symposium on Computer Architecture, June 2003.

Digital Library

[21]

G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar Processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pp. 414--425.

Digital Library

[22]

Y. Song and M. Dubois, Assisted Execution. University of Southern California, Technical Report #CENG 98-25, Department of EE-Systems, October 1998.

[23]

C. B. Zilles and G. S. Sohi. Execution-based prediction using speculative slices. In Proceedings of the 28th Annual International Symposium on Computer Architecture, June 2001, pp. 2--13.

Digital Library

Cited By

Mohammadi MAamodt TDally W(2017)CG-OoOACM Transactions on Architecture and Code Optimization10.1145/315103414:4(1-26)Online publication date: 5-Dec-2017
https://dl.acm.org/doi/10.1145/3151034
Lakshminarasimhan KNaithani AFeliu JEeckhout L(2022)The Forward Slice Core: A High-Performance, Yet Low-Complexity MicroarchitectureACM Transactions on Architecture and Code Optimization10.1145/349942419:2(1-25)Online publication date: 31-Jan-2022
https://dl.acm.org/doi/10.1145/3499424
Naithani AEeckhout L(2022)Reliability-Aware Runahead2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00062(772-785)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00062
Show More Cited By

Index Terms

Continual flow pipelines
1. Computer systems organization
  1. Architectures

Recommendations

Continual flow pipelines
ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides ...
Continual flow pipelines
ASPLOS 2004

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides ...
Continual flow pipelines
ASPLOS '04

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 38, Issue 5

ASPLOS '04

December 2004

283 pages

ISSN:0163-5980

DOI:10.1145/1037949

Issue’s Table of Contents

ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
October 2004
296 pages
ISBN:1581138040
DOI:10.1145/1024393
General Chair:
Shubu Mukherjee
Intel Corporation
,
Program Chair:
Kathryn S. McKinley
University of Texas at Austin

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2004

Published in SIGOPS Volume 38, Issue 5

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

171
Total Citations
View Citations
3,408
Total Downloads

Downloads (Last 12 months)80
Downloads (Last 6 weeks)6

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mohammadi MAamodt TDally W(2017)CG-OoOACM Transactions on Architecture and Code Optimization10.1145/315103414:4(1-26)Online publication date: 5-Dec-2017
https://dl.acm.org/doi/10.1145/3151034
Lakshminarasimhan KNaithani AFeliu JEeckhout L(2022)The Forward Slice Core: A High-Performance, Yet Low-Complexity MicroarchitectureACM Transactions on Architecture and Code Optimization10.1145/349942419:2(1-25)Online publication date: 31-Jan-2022
https://dl.acm.org/doi/10.1145/3499424
Naithani AEeckhout L(2022)Reliability-Aware Runahead2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00062(772-785)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00062
Yu CBai YWang R(2021)MIPSGPU: Minimizing Pipeline Stalls for GPUs With Non-Blocking ExecutionIEEE Transactions on Computers10.1109/TC.2020.302604370:11(1804-1816)Online publication date: 1-Nov-2021
https://doi.org/10.1109/TC.2020.3026043
Oliveira GGomez-Luna JOrosa LGhose SVijaykumar NFernandez ISadrosadati MMutlu O(2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3110993
Mashimo SShioya RInoue K(2020)Energy Efficient Runahead Execution on a Tightly Coupled Heterogeneous CoreProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3368474.3368496(207-216)Online publication date: 15-Jan-2020
https://dl.acm.org/doi/10.1145/3368474.3368496
Naithani AFeliu JAdileh AEeckhout L(2020)Precise Runahead Execution2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00040(397-410)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00040
Ham TAragón JMartonosi M(2019)Efficient Data Supply for Parallel Heterogeneous ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/331033216:2(1-23)Online publication date: 26-Apr-2019
https://dl.acm.org/doi/10.1145/3310332
Kondguli SHuang MBahar IHerlihy MWitchel ELebeck A(2019)BootstrappingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304052(687-700)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304052
Savas MGuney ITokatli NKisinbay BKucuk G(2019)iMODE (interactive MOod Detection Engine) Processor2019 4th International Conference on Computer Science and Engineering (UBMK)10.1109/UBMK.2019.8907005(1-6)Online publication date: Sep-2019
https://doi.org/10.1109/UBMK.2019.8907005
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents