[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

Published: 01 March 1990 Publication History

Abstract

The problems of data dependency resolution and precise interrupt implementation in pipelined processors are combined. A design for a hardware mechanism that resolves dependencies dynamically and, at the same time, guarantees precise interrupts is presented. Simulation studies show that by resolving dependencies the proposed mechanism is able to obtain a significant speedup over a simple instruction issue mechanism as well as implement precise interrupts.

References

[1]
{1} CDC Cyber 200 Model 205 Computer System Hardware Reference Manual, Control Data Corp., Arden Hills, MN, 1981.
[2]
{2} R. D. Acosta, J. Kjelstrup, and H. C. Torng, "An instruction issuing approach to enhancing performance in multiple functional unit processors," IEEE Trans. Comput., vol. C-35, pp. 815-828, Sept. 1986.
[3]
{3} D. W. Anderson, F. J. Sparacio, and R. M. Tomasulo, "The IBM System/360 Model 91: Machine philosophy and instruction-handling," IBM J. Res. Develop., pp. 8-24, Jan. 1967.
[4]
{4} P. Chow and M. Horowitz, "Architectural tradeoffs in the design of MIPS-X," in Proc. 14th Annu. Symp. Comput. Architecture, Pittsburgh, PA, June 1987, pp. 300-308.
[5]
{5} CRAY, CRAY-1 Computer Systems, Hardware Reference Manual. Chippewa Falls, WI: Cray Research, Inc., 1982.
[6]
{6} J. Hennessy, N. Jouppi, F. Baskett, T. Gross, and J. Gill, "Hard-ware/software tradeoffs for increased performance," in Proc. Int. Symp. Architectural Support Programming Languages Oper. Syst., Mar. 1982, pp. 2-11.
[7]
{7} P. Y. T. Hsu and E. S. Davidson, "Highly concurrent scalar processing," in Proc. 13th Annu. Symp. Comput. Architecture, June 1986, pp. 386-395.
[8]
{8} W. Hwu and Y. N. Patt, "HPSm, A high performance restricted data flow architecture having minimal functionality," in Proc. 13th Annu. Symp. Comput. Architecture, June 1986, pp. 297-307.
[9]
{9} W. Hwu and Y. N. Patt, "Design choices for the HPSm microprocessor chip," in Proc. 20th Annu. Hawaii Int. Conf. Syst. Sci., Kona, HI, Jan. 1987.
[10]
{10} W. Hwu and Y. N. Patt, "Checkpoint repair for high-performance out-of-order execution machines," IEEE Trans. Comput., vol. C-36, pp. 1496-1514, Dec. 1987.
[11]
{11} R. M. Keller, "Look-ahead processors," ACM Comput. Surveys, vol. 7, pp. 66-72, Dec. 1975.
[12]
{12} P. M. Kogge, The Architecture of Pipelined Computers. New York: McGraw-Hill, 1981.
[13]
{13} J. K. F. Lee and A. J. Smith, "Branch prediction strategies and branch target buffer design," IEEE Comput. Mag., vol. 17, pp. 6-22, Jan. 1984.
[14]
{14} S. McFarling and J. Hennessy, "Reducing the cost of branches," in Proc. 13th Annu. Symp. Comput. Architecture, Tokyo, Japan, June 1986, pp. 396-304.
[15]
{15} F. H. McMahon, FORTRAN CPU Performance Analysis, Lawrence Livermore Labs., 1972.
[16]
{16} N. Pang and J. E. Smith, "CRAY-1 simulation tools," Tech. Rep. ECE-83-11, Univ. of Wisconsin-Madison, Dec. 1983.
[17]
{17} Y. N. Patt, W.-M. Hwu, and M. Shebanow, "HPS, A new microarchitecture: Rationale and introduction," in Proc. 18th Annu. Workshop Microprogramming, Pacific Grove, CA, Dec. 1985, pp. 103-108.
[18]
{18} Y. N. Patt, S. W. Melvin, W.-M. Hwu, and M. Shebanow, "Critical issues regarding HPS, A high performance microarchitecture," in Proc. 18th Annu. Workshop Microprogramming, Pacific Grove, CA, Dec. 1985, pp. 109-116.
[19]
{19} A. Pleszkun, J. Goodman, W. C. Hsu, R. Joersz, G. Bier, P. Woest, and P. Schecter, "WISQ: A restartable architecture using queues," in Proc. 14th Annu. Symp. Comput. Architecture, Pittsburgh, PA, June 1987, pp. 290-299.
[20]
{20} A. R. Peszkun and G. S. Sohi, "The performance potential of multiple functional unit processors," in Proc. 15th Annu. Symp. Comput. Architecture, Honolulu, HI, June 1988, pp. 37-44.
[21]
{21} R. M. Russel, "The CRAY-1 computer system," Commun. ACM, vol. 21, pp. 63-72, Jan. 1978.
[22]
{22} J. E. Smith, "A study of branch prediction strategies," in Proc. 8th Annu. Symp. Comput. Architecture, May 1981, pp. 135-148.
[23]
{23} J. E. Smith, "Characterizing computer performance with a single number," Commun. ACM, vol. 31, pp. 1202-1206, Oct. 1988.
[24]
{24} J. E. Smith and A. R. Pleszkun, "Implementing precise interrupts in pipelined processors," IEEE Trans. Comput., vol. 37, pp. 562-573, May 1988.
[25]
{25} G. S. Sohi and S. Vajapeyam, "Instruction issue logic for high-performance, interruptible pipelined processors," in Proc. 14th Annu. Symp. Comput. Architecture, Pittsburgh, PA, June 1987, pp. 27-34.
[26]
{26} R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units," IBM J. Res. Develop., pp. 25-33, Jan. 1967.
[27]
{27} S. Weiss and J. E. Smith, "Instruction issue logic in pipelined supercomputers," IEEE Trans. Comput., vol. C-33, pp. 1013-1022, Nov. 1984.

Cited By

View all
  • (2022)Development of a generalized model for parallel-streaming neural element and structures for scalar product calculation devicesThe Journal of Supercomputing10.1007/s11227-022-04838-079:5(4820-4846)Online publication date: 30-Sep-2022
  • (2022)Design of the Processors for Fast Cosine and Sine Fourier TransformsCircuits, Systems, and Signal Processing10.1007/s00034-022-02012-841:9(4928-4951)Online publication date: 1-Sep-2022
  • (2021)Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue CoresIEEE Transactions on Computers10.1109/TC.2020.298731470:2(212-227)Online publication date: 1-Feb-2021
  • Show More Cited By

Recommendations

Reviews

Ned Chapin

The author considers two issues in this paper dealing with pipelined computers: data dependencies and precise interrupts. A companion issue, not focused on, is branch instructions. For all three issues, execution speed is the author's concern. In a multiprocessing pipelined computer, one process may have to wait for data from another process, slowing performance overall. Also, when virtual memory is used, interrupts must be precisely timed for the contents of storage to be as expected by the programmer. In this paper, the author proposes a hardware solution that improves computer performance by reducing the data dependencies and, at the same time, provides for precise interrupts. To support the proposal, the author reports simulations showing improvement exceeding 150 percent under some conditions. The author calls the proposal a register update unit (RUU), and places it after the decode and issue unit and before the functional units. The author says his proposal is based on a modification of Tomasulo's algorithm. He covers two cases in some detail—the RUU with bypass logic and with limited bypass logic. The RUU works like managing a queue. The head of the queue points to the instruction needed for a precise interrupt; the tail points to the next available slot for the use of the decode and issue unit. The most stimulating part of this paper is the author's willingness to address the interdependency of interrupts and data dependencies, which often are discussed as though they had scant interactions. The author's seemingly close tie to Cray hardware and his limited discussion of hardware alternatives, however, left me unconvinced about the possible value of his proposal for other hardware.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 39, Issue 3
March 1990
133 pages
ISSN:0018-9340
Issue’s Table of Contents

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 March 1990

Author Tags

  1. data dependency resolution
  2. instruction issue mechanism
  3. interrupts
  4. parallel architectures
  5. pipeline processing.
  6. pipelined computers
  7. pipelined processors
  8. precise interrupt implementation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Development of a generalized model for parallel-streaming neural element and structures for scalar product calculation devicesThe Journal of Supercomputing10.1007/s11227-022-04838-079:5(4820-4846)Online publication date: 30-Sep-2022
  • (2022)Design of the Processors for Fast Cosine and Sine Fourier TransformsCircuits, Systems, and Signal Processing10.1007/s00034-022-02012-841:9(4928-4951)Online publication date: 1-Sep-2022
  • (2021)Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue CoresIEEE Transactions on Computers10.1109/TC.2020.298731470:2(212-227)Online publication date: 1-Feb-2021
  • (2017)Efficient exception handling support for GPUsProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123950(109-122)Online publication date: 14-Oct-2017
  • (2016)Efficient resource sharing algorithm for physical register file in simultaneous multi-threading processorsMicroprocessors & Microsystems10.1016/j.micpro.2016.06.00245:PB(270-282)Online publication date: 1-Sep-2016
  • (2011)Comparing FPGA vs. custom cmos and the impact on processor microarchitectureProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays10.1145/1950413.1950419(5-14)Online publication date: 27-Feb-2011
  • (2011)Computer Architecture, Fifth EditionundefinedOnline publication date: 29-Sep-2011
  • (2009)Access region cache with register guided memory reference partitioningJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.09.00255:10-12(434-445)Online publication date: 1-Oct-2009
  • (2008)Turbo-ROBProceedings of the 3rd international conference on High performance embedded architectures and compilers10.5555/1786054.1786079(258-272)Online publication date: 27-Jan-2008
  • (2008)The performance of pollution control victim cache for embedded systemsProceedings of the 21st annual symposium on Integrated circuits and system design10.1145/1404371.1404393(46-51)Online publication date: 1-Sep-2008
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media