[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/165939.165956acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article
Free access

Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

Published: 01 August 1993 Publication History
First page of PDF

References

[1]
J.E. Smith, "A Study of Branch Prediction Strategies", Proceedings of the 8th International Symposium on Computer Architecture, (May 1981), pp.135-148.
[2]
J. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Deign", IEEE Computer, (Jan. 1984), pp.6-22.
[3]
R. Colwell, R. Nix, J. O'Donnell, D. Papworth, and P. Rodman, "A VLIW Architecture for a Trace Scheduling Compiler," Proc of the 2nd Intl Conf on Architectural Support for Programming Languages and Operating Systems, (Oct. 1987), pp. 180-192.
[4]
B.R. Rau, D. Yen, W. Yen, and R. Towle, "The Cydra 5 Departmental Supercomputer-Deign Philosophies, Decisions, and Trade-offs," IEEE Computer, (Jan. 1989), pp. 12-35.
[5]
M. Butler, T-Y Yeh, Y.N. Part, M. Alsup, H. Scales, and M. Shebanow, "Instruction Level Parallelism is Greater Than Two", Proceedings of the 18th International Symposium on Computer Architecture, (May 1991), pp. 276-286.
[6]
T-Y Yeh and Y.N. Part, "Two-Level Adaptive Branch Prediction", The 24th ACM/IEEE Intl. Sym and Wkshop on Microarchitecture, (Nov. 1991), pp. 51-61.
[7]
T-Y Yeh and Y.N. Patt "Alternative Implementations of Two-Level Adaptive Branch Prediction," Proceedings of the 19th International Symposium on Computer Architecture, (May 1992), pp. 124-134.
[8]
S-T Pan, K. So, and J.T. Rahmeh, "Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation," Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, (Oct. 1992), pp. 76-84.
[9]
T-Y Yeh and Y.N. Patt "A Comprehensive Instruction Fetch Mechanism for a Processor Supporting Speculative Execution," Proc of the 25th International Symposium on Microarchitecture, (Dec. 1992), pp. 129-139.
[10]
T-Y Yeh and Y.N. Patt "A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History," Proceedings of the 20th International Symposium on Computer Architecture, (May 1993).
[11]
W. Hwu, S. Mahlke, W. Chen, P. Chang, N. Warter, R. Bringmann, R. Ouellete, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery, "The superblock: An effective technique for VLIW and superscalar compilation," The Journal of Supercomputing January 1993.

Cited By

View all
  • (2024)SuperCore: An Ultra-Fast Superconducting Processor for Cryogenic Applications2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00112(1532-1547)Online publication date: 2-Nov-2024
  • (2024)Alternate Path Fetch2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00091(1217-1229)Online publication date: 29-Jun-2024
  • (2023)Branch Target Buffer OrganizationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623774(240-253)Online publication date: 28-Oct-2023
  • Show More Cited By

Recommendations

Reviews

Mihail Sadeanu

The authors present a hardware mechanism to predict multiple branches (MBs) and fetch multiple nonconsecutive basic blocks (MNC BB) simultaneously in each clock cycle (CC), which is viable and effective. The proposed solution fully utilizes the fetch and execution bandwidth (bypassing the execution bandwidth wasted on instructions whose results are discarded, and instruction fetch bandwidth wasted on instructions that will not be executed). It introduces a highly accurate branch prediction algorithm, a branch address cache, and an instruction cache, all of which are hardware intensive but not excessively so for the newest and upcoming generations of MIMD computer designs. The authors describe mechanisms for fetching two and three basic blocks each clock cycle based on an “MB two-level adaptive branch predictor” algorithm. This algorithm provides highly accurate predictions of MB paths. Also, the procedure makes it possible to fetch MNC BB each clock cycle, the MB paths being predicted. The instruction cache is designed with a large bandwidth in order to supply MNC BB of instructions in a single CC. For simulation purposes, the team used a trace-driven simulator to evaluate the performance of a machine front end and a new performance metric “IPC f,” defined as the number of effective instructions fetched per CC by an instruction fetch mechanism. A comparison of various instruction cache schemes shows that the IPC f is greater for a higher set associativity level. The advanced branch address cache and instruction cache proposed design with interleaved banks are the ultimate schemes proposed for increasing the IPC f, for both integer and floating-point benchmarks, without compiler optimization or the hardware cost of multiple read ports. These solutions are considered for hardware designs in order to speed up the rate of extracting instruction parallelism from sequential program structures. Simulation results are presented in tables, graphs, and histograms, and are explained in detail in the text. The benchmarks should have been selected from the SPEC92 instead of the SPEC89 benchmark test set, which is already obsolete. The results are of interest for hardware chip set designers and manufacturers. They might also be used to extend new parallel structures and architectures of multiprocessor MIMD computers.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '93: Proceedings of the 7th international conference on Supercomputing
August 1993
425 pages
ISBN:089791600X
DOI:10.1145/165939
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1993

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICS93
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)235
  • Downloads (Last 6 weeks)20
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SuperCore: An Ultra-Fast Superconducting Processor for Cryogenic Applications2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00112(1532-1547)Online publication date: 2-Nov-2024
  • (2024)Alternate Path Fetch2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00091(1217-1229)Online publication date: 29-Jun-2024
  • (2023)Branch Target Buffer OrganizationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623774(240-253)Online publication date: 28-Oct-2023
  • (2022)HCIP: Hybrid Short Long History Table-based Cache Instruction PrefetcherInternational Journal of Next-Generation Computing10.47164/ijngc.v13i3.758Online publication date: 31-Oct-2022
  • (2022)An Investigation of Microarchitectural Cache-Based Side-Channel Attacks from a Digital Forensic Perspective: Methods of Exploits and CountermeasuresArtificial Intelligence in Cyber Security: Impact and Implications10.1007/978-3-030-88040-8_11(281-306)Online publication date: 1-Jan-2022
  • (2020)Energy Efficient On-Demand Dynamic Branch Prediction ModelsIEEE Transactions on Computers10.1109/TC.2019.295671069:3(453-465)Online publication date: 1-Mar-2020
  • (2019)OverCome: Coarse-Grained Instruction Commit with Handover Register RenamingIEEE Transactions on Computers10.1109/TC.2019.293655768:12(1802-1816)Online publication date: 1-Dec-2019
  • (2019)Design of Instruction Analyzer with Semantic-Based Loop Unrolling Mechanism in the Hyperscalar ArchitectureNew Trends in Computer Technologies and Applications10.1007/978-981-13-9190-3_1(3-19)Online publication date: 11-Jul-2019
  • (2018)A survey of techniques for dynamic branch predictionConcurrency and Computation: Practice and Experience10.1002/cpe.466631:1Online publication date: 2-Sep-2018
  • (2015)Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)10.1109/ISLPED.2015.7273489(49-54)Online publication date: Jul-2015
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media