Article

Free access

A fill-unit approach to multiple instruction issue

Authors:

Manoj Franklin,

Mark SmothermanAuthors Info & Claims

MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture

Pages 162 - 171

https://doi.org/10.1145/192724.192748

Published: 30 November 1994 Publication History

PDF eReader

Abstract

Multiple issue of instructions occurs in superscalar and VLIW machines. This paper investigates a third type of machine design, which combines the advantages of code compatibility as in superscalars and the absence of complex dependency-checking logic from the decoder as in VLIW. In this design, a stream of scalar instructions is executed by the hardware and is simultaneously compacted into VLIW-type instructions, which are then stored in a structure called a shadow cache. When a shadow cache line contains the instructions requested by the fetch unit, the scalar instruction stream is preempted and all operations in the shadow cache line are simultaneously issued and executed. The mechanism that compacts instructions is called a fill unit, and was first proposed for dynamically compacting microoperations into large executable units by Melvin, Shebanow, and Patt in 1988. We have extended their approach to directly handle data dependencies, delayed branches, and speculative execution (using branch prediction). This approach is evaluated using the MIPS architecture, and a six-functional-unit machine is found to be 52 to 108% faster than a single-issue processor for unrecompiled SPECint92 benchmarks.

References

[1]

A. Aiken and A. Nicolau, "A Development Environment for Horizontal Microcode," IEEE Transactions on Software Engineering, vol. 14, no. 5, May 1988, pp. 584-594.

Digital Library

Google Scholar

[2]

G. Blanck and S. Krueger, "The SuperSPARC Microprocessor," Proc. 37th COMPCON, San Francisco, February 1992, pp. 136-141.

Digital Library

Google Scholar

[3]

K. Diefendorff and M. Allen, "Organization of the Motorola 88110 Superscalar RISC Microprocessor,'' IEEE Micro, vol. 12, no. 2, April 1992, pp. 40-63.

Digital Library

Google Scholar

[4]

K. Ebcioglu, "Some Design Ideas for a VLIW Architecture for Sequential Natured Software," in M. Cosnard, et al., (eds.), Parallel Processing (Proc. IFiP WG 10.3 Working Conference on Parallel Processing, Pisa, Italy), North Holland, 1988, pp. 3-21.

Google Scholar

[5]

J. Hennessy and D. Patterson, Computer Architecture A Quantitative Approach. San Mateo, CA: Morgan Kaufmann, 1990.

Digital Library

Google Scholar

[6]

W-M. Hwu and Y. Patt, "Checkpoint Repair for High Performance Out-of-Order Execution Machines,'' iEEE Transactions on Computers, vol. C-36, no. 12, December 1987, pp. 1496-1514.

Digital Library

Google Scholar

[7]

M. Johnson, $uperscalar Microprocessor Design. Englewood Cliffs, NJ: Prentice-Hall, 1991.

Google Scholar

[8]

G. Kane and J. Heinrich, MIPS RISC Architecture. Englewood Cliffs, NJ: Prentice-Hall, 1992.

Digital Library

Google Scholar

[9]

N. Malik, R. Eickemeyer, and S. Vassiliadis, "Interlock Collapsing ALU for Increased Instruction- Level Parallelism," Proc. Micro.25, Portland, Decemeber 1992, pp. 149-157.

Digital Library

Google Scholar

[10]

S. Melvin, M. Shebanow, Y. Patt, "Hardware Support for Large Atomic Units in Dynamically Scheduled Machines," Proc. Micro-~1, San Diego, Decemeber 1988, pp. 60-66.

Digital Library

Google Scholar

[11]

V. Popescu, M. Schultz, J. Spracklen, G. Gibson, B. Lightner, and D. Isaman, "The Metafiow Architecture," IEEE Micro, vol. 11, no. 3, June 1991, pp. 10-73.

Digital Library

Google Scholar

[12]

G.S. Sohi, "Instruction Issue Logic for High- Performance, Interruptible, Multiple Functional Unit, Pipelined Computers," IEEE Transactions on Computers, vol. 39, no. 3, March 1990, pp. 349-359.

Digital Library

Google Scholar

[13]

S. Weiss and J.E. Smith, POWER and PowerPC. San Francisco: Morgan Kaufmann, 1994.

Digital Library

Google Scholar

[14]

T-Y. Yeh and Y. Patt, "Alternative Implementations of Two-Level Adaptive Branch Prediction," Proc. ISCA 92, Australia, May 1992, pp. 124-134.

Digital Library

Google Scholar

Cited By

View all

Liu FAhn HBeard SOh TAugust D(2015)DynaSpAMACM SIGARCH Computer Architecture News10.1145/2872887.275041443:3S(541-553)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750414
Liu FAhn HBeard SOh TAugust DMarr DAlbonesi D(2015)DynaSpAMProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750414(541-553)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750414
Valluri MJohn LMcKinley KArvind Rudolph L(2005)Low-power, low-complexity instruction issue using compiler assistanceProceedings of the 19th annual international conference on Supercomputing10.1145/1088149.1088177(209-218)Online publication date: 20-Jun-2005
https://dl.acm.org/doi/10.1145/1088149.1088177
Show More Cited By

Index Terms

A fill-unit approach to multiple instruction issue
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
      2. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
  2. Integrated circuits
    1. Semiconductor memory

Recommendations

An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
Parallelizing nonnumerical code with selective scheduling and software pipelining

Instruction-level parallelism (ILP) in nonnumerical code is regarded as scarce and hard to exploit due to its irregularity. In this article, we introduce a new code-scheduling technique for irregular ILP called “selective scheduling” which can be used as ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture

November 1994

233 pages

ISBN:0897917073

DOI:10.1145/192724

Chairmen:
Hans Mulder
Intel Corp.
,
Matthew Farrens
Univ. of California, Davis

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 1994

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MICRO94

Sponsor:

SIGMICRO
IEEE-CS\TCMM

MICRO94: 27th Annual International Symposium on Microarchitecture

November 30 - December 2, 1994

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
422
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)6

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Liu FAhn HBeard SOh TAugust D(2015)DynaSpAMACM SIGARCH Computer Architecture News10.1145/2872887.275041443:3S(541-553)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750414
Liu FAhn HBeard SOh TAugust DMarr DAlbonesi D(2015)DynaSpAMProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750414(541-553)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750414
Valluri MJohn LMcKinley KArvind Rudolph L(2005)Low-power, low-complexity instruction issue using compiler assistanceProceedings of the 19th annual international conference on Supercomputing10.1145/1088149.1088177(209-218)Online publication date: 20-Jun-2005
https://dl.acm.org/doi/10.1145/1088149.1088177
Tang YDeng KCao HZhou X(2005)Trace-Based runtime instruction rescheduling for architecture extensionProceedings of the Second international conference on Embedded Software and Systems10.1007/11599555_4(4-15)Online publication date: 16-Dec-2005
https://dl.acm.org/doi/10.1007/11599555_4
Oberoi PSohi G(2003)Parallelism in the front-endACM SIGARCH Computer Architecture News10.1145/871656.85964531:2(230-240)Online publication date: 1-May-2003
https://dl.acm.org/doi/10.1145/871656.859645
Valluri MJohn LHanson HVerbauwhede IRoh H(2003)Exploiting compiler-generated schedules for energy savings in high-performance processorsProceedings of the 2003 international symposium on Low power electronics and design10.1145/871506.871608(414-419)Online publication date: 25-Aug-2003
https://dl.acm.org/doi/10.1145/871506.871608
Oberoi PSohi GGottlieb ALi K(2003)Parallelism in the front-endProceedings of the 30th annual international symposium on Computer architecture10.1145/859618.859645(230-240)Online publication date: 9-Jun-2003
https://dl.acm.org/doi/10.1145/859618.859645
Radhakrishnan RTalla DJohn L(2000)Allowing for ILP in an embedded Java processorACM SIGARCH Computer Architecture News10.1145/342001.33970228:2(294-305)Online publication date: 1-May-2000
https://dl.acm.org/doi/10.1145/342001.339702
Chou YShen J(2000)Instruction path coprocessorsACM SIGARCH Computer Architecture News10.1145/342001.33969428:2(270-281)Online publication date: 1-May-2000
https://dl.acm.org/doi/10.1145/342001.339694
Radhakrishnan RTalla DJohn LBerenbaum AEmer J(2000)Allowing for ILP in an embedded Java processorProceedings of the 27th annual international symposium on Computer architecture10.1145/339647.339702(294-305)Online publication date: 10-Jun-2000
https://dl.acm.org/doi/10.1145/339647.339702
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

An evaluation of speculative instruction execution on simultaneous multithreaded processors

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

Parallelizing nonnumerical code with selective scheduling and software pipelining