[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/224170.224301acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Compiling and optimizing for decoupled architectures

Published: 08 December 1995 Publication History

Abstract

Decoupled architectures provide a key to the problem of sustained supercomputer performance through their ability to hide large memory latencies. When a program executes in a decoupled mode the perceived memory latency at the processor is zero; effectively the entire physical memory has an access time equivalent to the processor's register file, and latency is completely hidden. However, the asynchronous functional units within a decoupled architecture must occasionally synchronize, incurring a high penalty. The goal of compiling and optimizing for decoupled architectures is to partition the program between the asynchronous functional units in such a way that latencies are hidden but synchronization events are executed infrequently. This paper describes a model for decoupled compilation, and explains the effectiveness of compilation for decoupled systems. A number of new compiler optimizations are introduced and evaluated quantitatively using the Perfect Club scientific benchmarks. We show that with a suitable repertiore of optimizations, it is possible to hide large latencies most of the time for most of the programs in the Perfect Club.

References

[1]
Goodman, J., Hsieh, J., Liou, K., Plezkun, A., Schecteur, P., Young, H.: PIPE: A VLSI Decoupled Architecture. Proc. 12 Int. Symp. on Computer Architecture, (June 1985).
[2]
Smith, J.E., et al.: The ZS-1 Central Processor. Proc. 2 Int. Conf. on Architectural Support for Programming Languages and Operating Systems, (Oct. 1987), Palo Alto, CA.
[3]
Wulf, Wm. A,: An Evaluation of the WM Architecture, Proc. Int. Symp. on Computer Architecture, (May 1992), Gold Coast, Australia.
[4]
R.P. Colwell and R.L. Steck, "A 0.6 micron BiCMOS Processor with Dynamic Execution", in Proc. IEEE Int. Solid-state Circuits Conf. 1994. See also URL http://www.intel.com/procs/p6
[5]
P. Hsu, "Design of the TFP Microprocessor", IEEE Micro, April 1994, pp.23-33. See also URL http://www.mips.com/HTMLs/R8000_B.html
[6]
Bird, P., Rawsthorne, A., Topham, N.P.: The Effectiveness of Decoupling. Proc. Int. Conf. on Supercomputing (July 1993), Tokyo, Japan.
[7]
Sites, R.L. (Ed.): Alpha Architecture Reference Manual. Digital Press, 1992.
[8]
The Official HTML Standard (available at URL http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html)
[9]
Cybenko, G., Kipp, L., Pointer, L., Kuck, D.: Supercomputer Performance Evaluation and the Perfect Benchmarks, Proc. Int. Conf. on Supercomputing (1990).
[10]
Harris, T.J., and Topham, N.P.: The Scalability of Decoupled Multiprocessors. Proc. Conf. on Scalable High Performance Computing (1994), Knoxville, TN.
[11]
Gannon, D. et al.: SIGMA II: A Tool Kit for Building Parallelizing Compilers and Performance Analysis Systems. IFIP Transactions A-11, Programming Environments for Parallel Computing. North-Holland, 1992.
[12]
Fisher, J.A.: VLIW architectures: Supercomputing via overlapped execution. Proc. 2nd Int. Conf. Supercomputing, Santa Barbara (May, 1987).
[13]
Harris, T.J., and Topham, N.P.: The Use of Caching in Decoupled Multiprocessors with Shared Memory, Proc. Scalable Shared Memory Workshop, at Int. Parallel Processing Symposium (1994), Cancun, Mexico.
[14]
Oed, W.: Cray Y-MP C90: System Features and Early Benchmark Results. Parallel Computing 18 (1992) 947-954.
[15]
Rau B.R., Glaeser C.D: Some Scheduling Techniques and an easily schedulable horizontal architecture for high performance scientific computing. Proc. 14th Ann. Microprogramming Workshop (Oct. 1981), pp. 183-197.
[16]
Topham, N.P. and McDougall, K.: Performance of the Decoupled ACRI-1 Architecture: the Perfect Club. Proc. High Performance Computing - Europe (1995), Milan, Italy.

Cited By

View all
  • (2024)WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00086(1-16)Online publication date: 2-Mar-2024
  • (2021)Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00061(654-667)Online publication date: Feb-2021
  • (2019)Efficient Data Supply for Parallel Heterogeneous ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/331033216:2(1-23)Online publication date: 26-Apr-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing
December 1995
875 pages
ISBN:0897918169
DOI:10.1145/224170
  • Chairman:
  • Sid Karin
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 1995

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Benchmarks
  2. Compiling
  3. Decoupled architecture
  4. Optimization
  5. Performance
  6. Quantitative analysis

Qualifiers

  • Article

Conference

SC '95
Sponsor:

Acceptance Rates

Supercomputing '95 Paper Acceptance Rate 69 of 241 submissions, 29%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00086(1-16)Online publication date: 2-Mar-2024
  • (2021)Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00061(654-667)Online publication date: Feb-2021
  • (2019)Efficient Data Supply for Parallel Heterogeneous ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/331033216:2(1-23)Online publication date: 26-Apr-2019
  • (2017)Decoupling Data Supply from Computation for Latency-Tolerant Communication in Heterogeneous ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/307562014:2(1-27)Online publication date: 28-Jun-2017
  • (2015)DeSCProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830800(191-203)Online publication date: 5-Dec-2015
  • (2011)OUTRIDERACM SIGARCH Computer Architecture News10.1145/2024723.200007939:3(117-128)Online publication date: 4-Jun-2011
  • (2011)OUTRIDERProceedings of the 38th annual international symposium on Computer architecture10.1145/2000064.2000079(117-128)Online publication date: 4-Jun-2011
  • (2008)Deriving Efficient Data Movement from Decoupled Access/Execute SpecificationsProceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers10.1007/978-3-540-92990-1_14(168-182)Online publication date: 24-Dec-2008
  • (2005)A limitation study into access decouplingEuro-Par'97 Parallel Processing10.1007/BFb0002859(1102-1111)Online publication date: 26-Sep-2005
  • (2001)Multithreading decoupled architectures for complexity-effective general purpose computingACM SIGARCH Computer Architecture News10.1145/563647.56365829:5(56-61)Online publication date: 1-Dec-2001
  • Show More Cited By

View Options

View options

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media