Article

Free access

Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

Authors:

Tse-Yu Yeh,

Deborah T. Marr,

Yale N. PattAuthors Info & Claims

ICS '93: Proceedings of the 7th international conference on Supercomputing

Pages 67 - 76

https://doi.org/10.1145/165939.165956

Published: 01 August 1993 Publication History

PDF eReader

References

[1]

J.E. Smith, "A Study of Branch Prediction Strategies", Proceedings of the 8th International Symposium on Computer Architecture, (May 1981), pp.135-148.

Digital Library

Google Scholar

[2]

J. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Deign", IEEE Computer, (Jan. 1984), pp.6-22.

Google Scholar

[3]

R. Colwell, R. Nix, J. O'Donnell, D. Papworth, and P. Rodman, "A VLIW Architecture for a Trace Scheduling Compiler," Proc of the 2nd Intl Conf on Architectural Support for Programming Languages and Operating Systems, (Oct. 1987), pp. 180-192.

Crossref

Google Scholar

[4]

B.R. Rau, D. Yen, W. Yen, and R. Towle, "The Cydra 5 Departmental Supercomputer-Deign Philosophies, Decisions, and Trade-offs," IEEE Computer, (Jan. 1989), pp. 12-35.

Digital Library

Google Scholar

[5]

M. Butler, T-Y Yeh, Y.N. Part, M. Alsup, H. Scales, and M. Shebanow, "Instruction Level Parallelism is Greater Than Two", Proceedings of the 18th International Symposium on Computer Architecture, (May 1991), pp. 276-286.

Digital Library

Google Scholar

[6]

T-Y Yeh and Y.N. Part, "Two-Level Adaptive Branch Prediction", The 24th ACM/IEEE Intl. Sym and Wkshop on Microarchitecture, (Nov. 1991), pp. 51-61.

Digital Library

Google Scholar

[7]

T-Y Yeh and Y.N. Patt "Alternative Implementations of Two-Level Adaptive Branch Prediction," Proceedings of the 19th International Symposium on Computer Architecture, (May 1992), pp. 124-134.

Digital Library

Google Scholar

[8]

S-T Pan, K. So, and J.T. Rahmeh, "Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation," Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, (Oct. 1992), pp. 76-84.

Digital Library

Google Scholar

[9]

T-Y Yeh and Y.N. Patt "A Comprehensive Instruction Fetch Mechanism for a Processor Supporting Speculative Execution," Proc of the 25th International Symposium on Microarchitecture, (Dec. 1992), pp. 129-139.

Digital Library

Google Scholar

[10]

T-Y Yeh and Y.N. Patt "A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History," Proceedings of the 20th International Symposium on Computer Architecture, (May 1993).

Digital Library

Google Scholar

[11]

W. Hwu, S. Mahlke, W. Chen, P. Chang, N. Warter, R. Bringmann, R. Ouellete, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery, "The superblock: An effective technique for VLIW and superscalar compilation," The Journal of Supercomputing January 1993.

Digital Library

Google Scholar

Cited By

View all

Choi JByun IHong JMin DKim JCho JJeong HTanaka MInoue KKim J(2024)SuperCore: An Ultra-Fast Superconducting Processor for Cryogenic Applications2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00112(1532-1547)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00112
Deshmukh ACai LPatt Y(2024)Alternate Path Fetch2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00091(1217-1229)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00091
Perais ASheikh R(2023)Branch Target Buffer OrganizationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623774(240-253)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623774
Show More Cited By

Index Terms

Recommendations

Author retrospective for increasing the instruction fetch rate via multiple branch prediction and a branch address cache
ACM International Conference on Supercomputing 25th Anniversary Volume

"Increasing the Instruction Fetch Rate viaMultiple Branch Prediction and a Branch Address Cache" was the first paper to propose a highly accurate hardware mechanism for predicting and fetching multiple non-contiguous basic blocks using leading-edge ...
Increasing the instruction fetch rate via block-structured instruction set architectures
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance ...
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance ...

Reviews

Reviewer: Mihail Sadeanu

The authors present a hardware mechanism to predict multiple branches (MBs) and fetch multiple nonconsecutive basic blocks (MNC BB) simultaneously in each clock cycle (CC), which is viable and effective. The proposed solution fully utilizes the fetch and execution bandwidth (bypassing the execution bandwidth wasted on instructions whose results are discarded, and instruction fetch bandwidth wasted on instructions that will not be executed). It introduces a highly accurate branch prediction algorithm, a branch address cache, and an instruction cache, all of which are hardware intensive but not excessively so for the newest and upcoming generations of MIMD computer designs. The authors describe mechanisms for fetching two and three basic blocks each clock cycle based on an “MB two-level adaptive branch predictor” algorithm. This algorithm provides highly accurate predictions of MB paths. Also, the procedure makes it possible to fetch MNC BB each clock cycle, the MB paths being predicted. The instruction cache is designed with a large bandwidth in order to supply MNC BB of instructions in a single CC. For simulation purposes, the team used a trace-driven simulator to evaluate the performance of a machine front end and a new performance metric “IPC f,” defined as the number of effective instructions fetched per CC by an instruction fetch mechanism. A comparison of various instruction cache schemes shows that the IPC f is greater for a higher set associativity level. The advanced branch address cache and instruction cache proposed design with interleaved banks are the ultimate schemes proposed for increasing the IPC f, for both integer and floating-point benchmarks, without compiler optimization or the hardware cost of multiple read ports. These solutions are considered for hardware designs in order to speed up the rate of extracting instruction parallelism from sequential program structures. Simulation results are presented in tables, graphs, and histograms, and are explained in detail in the text. The benchmarks should have been selected from the SPEC92 instead of the SPEC89 benchmark test set, which is already obsolete. The results are of interest for hardware chip set designers and manufacturers. They might also be used to extend new parallel structures and architectures of multiprocessor MIMD computers.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICS '93: Proceedings of the 7th international conference on Supercomputing

August 1993

425 pages

ISBN:089791600X

DOI:10.1145/165939

Chairman:
Yoichi Muraoka

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1993

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS93

Sponsor:

SIGARCH

ICS93: International Conference on Supercomputing

July 19 - 23, 1993

Tokyo, Japan

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

116
Total Citations
View Citations
1,223
Total Downloads

Downloads (Last 12 months)239
Downloads (Last 6 weeks)37

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Choi JByun IHong JMin DKim JCho JJeong HTanaka MInoue KKim J(2024)SuperCore: An Ultra-Fast Superconducting Processor for Cryogenic Applications2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00112(1532-1547)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00112
Deshmukh ACai LPatt Y(2024)Alternate Path Fetch2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00091(1217-1229)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00091
Perais ASheikh R(2023)Branch Target Buffer OrganizationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623774(240-253)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623774
Srivastava SSingh P(2022)HCIP: Hybrid Short Long History Table-based Cache Instruction PrefetcherInternational Journal of Next-Generation Computing10.47164/ijngc.v13i3.758Online publication date: 31-Oct-2022
https://doi.org/10.47164/ijngc.v13i3.758
Montasari RTait BJahankhani HCarroll F(2022)An Investigation of Microarchitectural Cache-Based Side-Channel Attacks from a Digital Forensic Perspective: Methods of Exploits and CountermeasuresArtificial Intelligence in Cyber Security: Impact and Implications10.1007/978-3-030-88040-8_11(281-306)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-3-030-88040-8_11
Mohammadi MHan SAtoofian EBaniasadi AAamodt TDally W(2020)Energy Efficient On-Demand Dynamic Branch Prediction ModelsIEEE Transactions on Computers10.1109/TC.2019.295671069:3(453-465)Online publication date: 1-Mar-2020
https://doi.org/10.1109/TC.2019.2956710
Jeong ILee CKim KRo W(2019)OverCome: Coarse-Grained Instruction Commit with Handover Register RenamingIEEE Transactions on Computers10.1109/TC.2019.293655768:12(1802-1816)Online publication date: 1-Dec-2019
https://doi.org/10.1109/TC.2019.2936557
Lu YChiu JChao SYe Y(2019)Design of Instruction Analyzer with Semantic-Based Loop Unrolling Mechanism in the Hyperscalar ArchitectureNew Trends in Computer Technologies and Applications10.1007/978-981-13-9190-3_1(3-19)Online publication date: 11-Jul-2019
https://doi.org/10.1007/978-981-13-9190-3_1
Mittal S(2018)A survey of techniques for dynamic branch predictionConcurrency and Computation: Practice and Experience10.1002/cpe.466631:1Online publication date: 2-Sep-2018
https://doi.org/10.1002/cpe.4666
Zhang WZhang HLach J(2015)Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)10.1109/ISLPED.2015.7273489(49-54)Online publication date: Jul-2015
https://doi.org/10.1109/ISLPED.2015.7273489
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Author retrospective for increasing the instruction fetch rate via multiple branch prediction and a branch address cache

Increasing the instruction fetch rate via block-structured instruction set architectures

Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

Reviews

Access critical reviews of Computing literature here