research-article

Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

Author:

Gurindar S. SohiAuthors Info & Claims

IEEE Transactions on Computers, Volume 39, Issue 3

Pages 349 - 359

https://doi.org/10.1109/12.48865

Published: 01 March 1990 Publication History

Publisher Site

Abstract

The problems of data dependency resolution and precise interrupt implementation in pipelined processors are combined. A design for a hardware mechanism that resolves dependencies dynamically and, at the same time, guarantees precise interrupts is presented. Simulation studies show that by resolving dependencies the proposed mechanism is able to obtain a significant speedup over a simple instruction issue mechanism as well as implement precise interrupts.

References

[1]

{1} CDC Cyber 200 Model 205 Computer System Hardware Reference Manual, Control Data Corp., Arden Hills, MN, 1981.

Google Scholar

[2]

{2} R. D. Acosta, J. Kjelstrup, and H. C. Torng, "An instruction issuing approach to enhancing performance in multiple functional unit processors," IEEE Trans. Comput., vol. C-35, pp. 815-828, Sept. 1986.

Digital Library

Google Scholar

[3]

{3} D. W. Anderson, F. J. Sparacio, and R. M. Tomasulo, "The IBM System/360 Model 91: Machine philosophy and instruction-handling," IBM J. Res. Develop., pp. 8-24, Jan. 1967.

Digital Library

Google Scholar

[4]

{4} P. Chow and M. Horowitz, "Architectural tradeoffs in the design of MIPS-X," in Proc. 14th Annu. Symp. Comput. Architecture, Pittsburgh, PA, June 1987, pp. 300-308.

Digital Library

Google Scholar

[5]

{5} CRAY, CRAY-1 Computer Systems, Hardware Reference Manual. Chippewa Falls, WI: Cray Research, Inc., 1982.

Google Scholar

[6]

{6} J. Hennessy, N. Jouppi, F. Baskett, T. Gross, and J. Gill, "Hard-ware/software tradeoffs for increased performance," in Proc. Int. Symp. Architectural Support Programming Languages Oper. Syst., Mar. 1982, pp. 2-11.

Digital Library

Google Scholar

[7]

{7} P. Y. T. Hsu and E. S. Davidson, "Highly concurrent scalar processing," in Proc. 13th Annu. Symp. Comput. Architecture, June 1986, pp. 386-395.

Digital Library

Google Scholar

[8]

{8} W. Hwu and Y. N. Patt, "HPSm, A high performance restricted data flow architecture having minimal functionality," in Proc. 13th Annu. Symp. Comput. Architecture, June 1986, pp. 297-307.

Digital Library

Google Scholar

[9]

{9} W. Hwu and Y. N. Patt, "Design choices for the HPSm microprocessor chip," in Proc. 20th Annu. Hawaii Int. Conf. Syst. Sci., Kona, HI, Jan. 1987.

Google Scholar

[10]

{10} W. Hwu and Y. N. Patt, "Checkpoint repair for high-performance out-of-order execution machines," IEEE Trans. Comput., vol. C-36, pp. 1496-1514, Dec. 1987.

Digital Library

Google Scholar

[11]

{11} R. M. Keller, "Look-ahead processors," ACM Comput. Surveys, vol. 7, pp. 66-72, Dec. 1975.

Digital Library

Google Scholar

[12]

{12} P. M. Kogge, The Architecture of Pipelined Computers. New York: McGraw-Hill, 1981.

Google Scholar

[13]

{13} J. K. F. Lee and A. J. Smith, "Branch prediction strategies and branch target buffer design," IEEE Comput. Mag., vol. 17, pp. 6-22, Jan. 1984.

Digital Library

Google Scholar

[14]

{14} S. McFarling and J. Hennessy, "Reducing the cost of branches," in Proc. 13th Annu. Symp. Comput. Architecture, Tokyo, Japan, June 1986, pp. 396-304.

Digital Library

Google Scholar

[15]

{15} F. H. McMahon, FORTRAN CPU Performance Analysis, Lawrence Livermore Labs., 1972.

Google Scholar

[16]

{16} N. Pang and J. E. Smith, "CRAY-1 simulation tools," Tech. Rep. ECE-83-11, Univ. of Wisconsin-Madison, Dec. 1983.

Google Scholar

[17]

{17} Y. N. Patt, W.-M. Hwu, and M. Shebanow, "HPS, A new microarchitecture: Rationale and introduction," in Proc. 18th Annu. Workshop Microprogramming, Pacific Grove, CA, Dec. 1985, pp. 103-108.

Digital Library

Google Scholar

[18]

{18} Y. N. Patt, S. W. Melvin, W.-M. Hwu, and M. Shebanow, "Critical issues regarding HPS, A high performance microarchitecture," in Proc. 18th Annu. Workshop Microprogramming, Pacific Grove, CA, Dec. 1985, pp. 109-116.

Digital Library

Google Scholar

[19]

{19} A. Pleszkun, J. Goodman, W. C. Hsu, R. Joersz, G. Bier, P. Woest, and P. Schecter, "WISQ: A restartable architecture using queues," in Proc. 14th Annu. Symp. Comput. Architecture, Pittsburgh, PA, June 1987, pp. 290-299.

Digital Library

Google Scholar

[20]

{20} A. R. Peszkun and G. S. Sohi, "The performance potential of multiple functional unit processors," in Proc. 15th Annu. Symp. Comput. Architecture, Honolulu, HI, June 1988, pp. 37-44.

Digital Library

Google Scholar

[21]

{21} R. M. Russel, "The CRAY-1 computer system," Commun. ACM, vol. 21, pp. 63-72, Jan. 1978.

Digital Library

Google Scholar

[22]

{22} J. E. Smith, "A study of branch prediction strategies," in Proc. 8th Annu. Symp. Comput. Architecture, May 1981, pp. 135-148.

Digital Library

Google Scholar

[23]

{23} J. E. Smith, "Characterizing computer performance with a single number," Commun. ACM, vol. 31, pp. 1202-1206, Oct. 1988.

Digital Library

Google Scholar

[24]

{24} J. E. Smith and A. R. Pleszkun, "Implementing precise interrupts in pipelined processors," IEEE Trans. Comput., vol. 37, pp. 562-573, May 1988.

Digital Library

Google Scholar

[25]

{25} G. S. Sohi and S. Vajapeyam, "Instruction issue logic for high-performance, interruptible pipelined processors," in Proc. 14th Annu. Symp. Comput. Architecture, Pittsburgh, PA, June 1987, pp. 27-34.

Digital Library

Google Scholar

[26]

{26} R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units," IBM J. Res. Develop., pp. 25-33, Jan. 1967.

Digital Library

Google Scholar

[27]

{27} S. Weiss and J. E. Smith, "Instruction issue logic in pipelined supercomputers," IEEE Trans. Comput., vol. C-33, pp. 1013-1022, Nov. 1984.

Google Scholar

Cited By

View all

Tsmots ITeslyuk VKryvinska NSkorokhoda OKazymyra I(2022)Development of a generalized model for parallel-streaming neural element and structures for scalar product calculation devicesThe Journal of Supercomputing10.1007/s11227-022-04838-079:5(4820-4846)Online publication date: 30-Sep-2022
https://dl.acm.org/doi/10.1007/s11227-022-04838-0
Tsmots IRabyk VKryvinska NYatsymirskyy MTeslyuk V(2022)Design of the Processors for Fast Cosine and Sine Fourier TransformsCircuits, Systems, and Signal Processing10.1007/s00034-022-02012-841:9(4928-4951)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1007/s00034-022-02012-8
Schuiki FZaruba FHoefler TBenini L(2021)Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue CoresIEEE Transactions on Computers10.1109/TC.2020.298731470:2(212-227)Online publication date: 1-Feb-2021
https://dl.acm.org/doi/10.1109/TC.2020.2987314
Show More Cited By

Index Terms

Recommendations

Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers
Instruction-level parallel processors
Instruction Issue Logic in Pipelined Supercomputers

Basic principles and design tradeoffs for control of pipelined processors are first discussed. We concentrate on register-register architectures like the CRAY-1 where pipeline control logic is localized to one or two pipeline stages and is referred to ...
Reducing Branch Delay to Zero in Pipelined Processors

A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation ...

Reviews

Reviewer: Ned Chapin

The author considers two issues in this paper dealing with pipelined computers: data dependencies and precise interrupts. A companion issue, not focused on, is branch instructions. For all three issues, execution speed is the author's concern. In a multiprocessing pipelined computer, one process may have to wait for data from another process, slowing performance overall. Also, when virtual memory is used, interrupts must be precisely timed for the contents of storage to be as expected by the programmer. In this paper, the author proposes a hardware solution that improves computer performance by reducing the data dependencies and, at the same time, provides for precise interrupts. To support the proposal, the author reports simulations showing improvement exceeding 150 percent under some conditions. The author calls the proposal a register update unit (RUU), and places it after the decode and issue unit and before the functional units. The author says his proposal is based on a modification of Tomasulo's algorithm. He covers two cases in some detail—the RUU with bypass logic and with limited bypass logic. The RUU works like managing a queue. The head of the queue points to the instruction needed for a precise interrupt; the tail points to the next available slot for the use of the decode and issue unit. The most stimulating part of this paper is the author's willingness to address the interdependency of interrupts and data dependencies, which often are discussed as though they had scant interactions. The author's seemingly close tie to Cray hardware and his limited discussion of hardware alternatives, however, left me unconvinced about the possible value of his proposal for other hardware.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 39, Issue 3

March 1990

133 pages

ISSN:0018-9340

Editor:
Ming T. Liu
Ohio State Univ., Columbus

Issue’s Table of Contents

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 March 1990

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

95
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tsmots ITeslyuk VKryvinska NSkorokhoda OKazymyra I(2022)Development of a generalized model for parallel-streaming neural element and structures for scalar product calculation devicesThe Journal of Supercomputing10.1007/s11227-022-04838-079:5(4820-4846)Online publication date: 30-Sep-2022
Tsmots IRabyk VKryvinska NYatsymirskyy MTeslyuk V(2022)Design of the Processors for Fast Cosine and Sine Fourier TransformsCircuits, Systems, and Signal Processing10.1007/s00034-022-02012-841:9(4928-4951)Online publication date: 1-Sep-2022
Schuiki FZaruba FHoefler TBenini L(2021)Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue CoresIEEE Transactions on Computers10.1109/TC.2020.298731470:2(212-227)Online publication date: 1-Feb-2021
Tanasic IGelado IJorda MAyguade ENavarro NHunter HMoreno JEmer JSanchez D(2017)Efficient exception handling support for GPUsProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123950(109-122)Online publication date: 14-Oct-2017
Zhang YLin W(2016)Efficient resource sharing algorithm for physical register file in simultaneous multi-threading processorsMicroprocessors & Microsystems10.1016/j.micpro.2016.06.00245:PB(270-282)Online publication date: 1-Sep-2016
Wong HBetz VRose JWawrzynek JCompton K(2011)Comparing FPGA vs. custom cmos and the impact on processor microarchitectureProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays10.1145/1950413.1950419(5-14)Online publication date: 27-Feb-2011
Hennessy JPatterson D(2011)Computer Architecture, Fifth EditionundefinedOnline publication date: 29-Sep-2011
Le GShi Y(2009)Access region cache with register guided memory reference partitioningJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.09.00255:10-12(434-445)Online publication date: 1-Oct-2009
Akl PMoshovos A(2008)Turbo-ROBProceedings of the 3rd international conference on High performance embedded architectures and compilers10.5555/1786054.1786079(258-272)Online publication date: 27-Jan-2008
Heck GHexsel RLubaszewski MRenovell MGupta R(2008)The performance of pollution control victim cache for embedded systemsProceedings of the 21st annual symposium on Integrated circuits and system design10.1145/1404371.1404393(46-51)Online publication date: 1-Sep-2008
Show More Cited By

Abstract

References

Cited By

Index Terms

Recommendations