TinyBERT for branch prediction in modern microprocessors

262 Accesses
Explore all metrics

Abstract

Recent progress has highlighted the crucial importance of branch prediction (BP) in optimizing computer performance, especially in reducing computational delays by preventing stalls in modern microprocessors (also known as CPUs). In this paper, we investigate the use of machine learning (ML) models to improve BP accuracy, focusing on the capabilities of transformer models based on their exceptional predictive and classification performance. Although existing studies have employed various ML methods for BP, their selected models are computationally expensive and impractical for such task. Hence, we present an advanced ML-based dynamic BP technique utilizing tiny bidirectional encoder representations from transformers (TinyBERT), notable for its efficiency, simplicity, and low resource utilization. This method not only streamlines the BP process but also offers a more effective alternative to conventional strategies. A key aspect of our approach is the application of local post hoc explanations, which provide deep insights into the model’s predictive actions. Our empirical findings reveal that this methodology secures a substantial 13% reduction in the rate of mispredictions compared to top predictors like TAGE-SC-L, across various multimedia and integer application benchmarks. These results underscore the potential of using compact transformers in establishing significant criteria for efficient and effective BP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Implementation and comparison of bi-modal dynamic branch prediction with static branch prediction schemes

Article 11 March 2021

BATAGE-BFNP: A High-Performance Hybrid Branch Predictor with Data-Dependent Branches Speculative Pre-execution for RISC-V Processors

Article 10 January 2023

An efficient branch predictor for improved accuracy of instruction level parallelism

Article 06 April 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data used to support the findings of this study are available upon request.

References

Sambo MK (2023) A comparative study of pipelining, branch prediction, and superscalar architectures for enhanced computer performance. Computer Science
Young C, Gloy N, Smith MD (1995) A comparative analysis of schemes for correlated branch prediction. ACM SIGARCH Computer Arch News 23(2):276–286
MATH Google Scholar
Lin C-K, Tarsa SJ (2019) Branch prediction is not a solved problem: Measurements, opportunities, and future directions. arXiv preprint
Sbera M, Vintan LN, Florea A (2001) Static and dynamic branch prediction using neural networks. Computer Science
Choi H, Park S (2021) A survey of machine learning-based system performance optimization techniques. Appl Sci 11(7):3235
MATH Google Scholar
Fu JW, Patel JH, Janssens BL (1992) Stride directed prefetching in scalar processors. ACM SIGMICRO Newsletter 23(1–2):102–110
Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444
Google Scholar
Joseph R (2021) A survey of deep learning techniques for dynamic branch prediction. arXiv preprint arXiv:2112.14911
Wu N, Xie Y (2021) A survey of machine learning for computer architecture and systems. ACM Computing Surveys (CSUR) 55:1–39
MATH Google Scholar
Zhang L, Wu N, Ge F, Zhou F, Yahya MR (2020) A dynamic branch predictor based on parallel structure of srnn. IEEE Access 8:86230–86237
Google Scholar
Mittal S (2019) A survey of techniques for dynamic branch prediction. Concurr Comput: Practice Exp 31(1):4666
MATH Google Scholar
Sburlan, A.-F.: Discovering predictive patterns: A study of contextual factors for next generation branch predictors. MEng Individual Project, Imperial College London, London (2023) Supervised by Prof. Paul Kelly and Dr, Giuliano Casale
Google Scholar
Jiménez DA, Lin C (2001) Dynamic branch prediction with perceptrons. In: Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture, pp. 197–206. IEEE
McFarling S (1993) Combining branch predictors. Technical report, Citeseer (June
MATH Google Scholar
Tullsen DM, Eggers SJ, Levy HM (1995) Simultaneous multithreading: Maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 392–403
Yeh T-Y, Patt YN (1991) Two-level adaptive training branch prediction. In: Proceedings of the 24th Annual International Symposium on Microarchitecture, pp. 51–61
Seznec A, Michaud P (2006) A case for (partially) tagged geometric history length branch prediction. J Ins-Level Parallelism 8:23
MATH Google Scholar
Seznec A (2014) Tage-sc-l branch predictors. In: JILP-Championship Branch Prediction
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386
MATH Google Scholar
Lee C-C, Chen I-C, Mudge TN (1997) The bi-mode branch predictor. In: Proceedings of 30th Annual International Symposium on Microarchitecture, pp. 4–13. IEEE
Akkary H, Srinivasan ST, Koltur R, Patil Y, Refaai W (2004) Perceptron-based branch confidence estimation. In: 10th International Symposium on High Performance Computer Architecture (HPCA’04), pp. 265–265. IEEE
Hida I, Ikebe M, Asai T, Motomura M (2016) A 2-clock-cycle naïve bayes classifier for dynamic branch prediction in pipelined risc microprocessors. In: 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 297–300. https://doi.org/10.1109/APCCAS.2016.7803958
Tarsa SJ, Lin C-K, Keskin G, Chinya G, Wang H (2019) Improving branch prediction by modeling global history with convolutional neural networks. arXiv preprint arXiv:1906.09889
Ozturk C, Sendag R (2010) An analysis of hard to predict branches. IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 213–222
Zangeneh S, Pruett S, Lym S, Patt YN (2020) Branchnet: A convolutional neural network to predict hard-to-predict branches. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 118–130. IEEE
Seznec A (2016) Exploring branch predictability limits with the mtage+ sc predictor. In: 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5), p. 4
Zangeneh S, Pruett S, Patt Y (2020) Branch prediction with multilayer neural networks: The value of specialization. In: Machine Learning for Computer Architecture and Systems. National Science Foundation. NSF-PAR ID: 10249272
Mao Y, Huiyang Z, Gui X (2017) Exp deep neural net branch prediction. NC University, ECE Department
Google Scholar
Zouzias A, Kalaitzidis K, Grot B (2021) Branch prediction as a reinforcement learning problem: Why, how and case studies. arXiv preprint arXiv:2106.13429
Villon LA, Susskind Z, Bacellar AT, Miranda ID, Araújo LS, Lima PM, Breternitz M Jr, John LK, França FM, Dutra DL (2023) A conditional branch predictor based on weightless neural networks. Neurocomputing 555:126637
Google Scholar
Aleksander I, Thomas W, Bowden P (1984) Wisard· a radical step forward in image recognition. Sens Rev 4(3):120–124
MATH Google Scholar
Shkadarevich D (2020) Branch Prediction Dataset. https://www.kaggle.com/datasets/dmitryshkadarevich/branch-prediction
Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805[SPACE]https://arxiv.org/abs/1810.04805 1810.04805
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Bhargava P, Drozd A, Rogers A (2021) Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Turc I, Chang M, Lee K, Toutanova K (2019) Well-read students learn better: The impact of student initialization on knowledge distillation. CoRR abs/1908.08962[SPACE]https://arxiv.org/abs/1908.089621908.08962
Alajmi A. Anwaarma/BP-balanced. datasets at hugging face. https://huggingface.co/datasets/Anwaarma/BP-balanced
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Advances in neural information processing systems 30
Wang Y, Fan H, Li S, Liang T, Zhang W (2024) A modular branch predictor performance analysis framework for fast design space exploration. In: 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1–6. IEEE
Jamet AV, Vavouliotis G, Jiménez DA, Alvarez L, Casas M (2024) A two level neural approach combining off-chip prediction with adaptive prefetch filtering. In: 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 528–542. IEEE

Download references

Funding

The authors received no specific funding for this study.

Author information

Authors and Affiliations

Computer Engineering Department, College of Engineering and Petroleum, Kuwait University, Shadadiyah, Kuwait
Anwar Alajmi, Bashair AlSarraf, Zainab Abualhassan, Abbas A. Fairouz & Imtiaz Ahmad

Authors

Anwar Alajmi
View author publications
You can also search for this author in PubMed Google Scholar
Bashair AlSarraf
View author publications
You can also search for this author in PubMed Google Scholar
Zainab Abualhassan
View author publications
You can also search for this author in PubMed Google Scholar
Abbas A. Fairouz
View author publications
You can also search for this author in PubMed Google Scholar
Imtiaz Ahmad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abbas A. Fairouz.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Alajmi, A., AlSarraf, B., Abualhassan, Z. et al. TinyBERT for branch prediction in modern microprocessors. Neural Comput & Applic 37, 1771–1782 (2025). https://doi.org/10.1007/s00521-024-10535-1

Download citation

Received: 04 March 2024
Accepted: 03 October 2024
Published: 20 November 2024
Issue Date: February 2025
DOI: https://doi.org/10.1007/s00521-024-10535-1

TinyBERT for branch prediction in modern microprocessors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Implementation and comparison of bi-modal dynamic branch prediction with static branch prediction schemes

BATAGE-BFNP: A High-Performance Hybrid Branch Predictor with Data-Dependent Branches Speculative Pre-execution for RISC-V Processors

An efficient branch predictor for improved accuracy of instruction level parallelism

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

TinyBERT for branch prediction in modern microprocessors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Implementation and comparison of bi-modal dynamic branch prediction with static branch prediction schemes

BATAGE-BFNP: A High-Performance Hybrid Branch Predictor with Data-Dependent Branches Speculative Pre-execution for RISC-V Processors

An efficient branch predictor for improved accuracy of instruction level parallelism

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now