Abstract
Recent progress has highlighted the crucial importance of branch prediction (BP) in optimizing computer performance, especially in reducing computational delays by preventing stalls in modern microprocessors (also known as CPUs). In this paper, we investigate the use of machine learning (ML) models to improve BP accuracy, focusing on the capabilities of transformer models based on their exceptional predictive and classification performance. Although existing studies have employed various ML methods for BP, their selected models are computationally expensive and impractical for such task. Hence, we present an advanced ML-based dynamic BP technique utilizing tiny bidirectional encoder representations from transformers (TinyBERT), notable for its efficiency, simplicity, and low resource utilization. This method not only streamlines the BP process but also offers a more effective alternative to conventional strategies. A key aspect of our approach is the application of local post hoc explanations, which provide deep insights into the model’s predictive actions. Our empirical findings reveal that this methodology secures a substantial 13% reduction in the rate of mispredictions compared to top predictors like TAGE-SC-L, across various multimedia and integer application benchmarks. These results underscore the potential of using compact transformers in establishing significant criteria for efficient and effective BP.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data used to support the findings of this study are available upon request.
References
Sambo MK (2023) A comparative study of pipelining, branch prediction, and superscalar architectures for enhanced computer performance. Computer Science
Young C, Gloy N, Smith MD (1995) A comparative analysis of schemes for correlated branch prediction. ACM SIGARCH Computer Arch News 23(2):276–286
Lin C-K, Tarsa SJ (2019) Branch prediction is not a solved problem: Measurements, opportunities, and future directions. arXiv preprint
Sbera M, Vintan LN, Florea A (2001) Static and dynamic branch prediction using neural networks. Computer Science
Choi H, Park S (2021) A survey of machine learning-based system performance optimization techniques. Appl Sci 11(7):3235
Fu JW, Patel JH, Janssens BL (1992) Stride directed prefetching in scalar processors. ACM SIGMICRO Newsletter 23(1–2):102–110
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444
Joseph R (2021) A survey of deep learning techniques for dynamic branch prediction. arXiv preprint arXiv:2112.14911
Wu N, Xie Y (2021) A survey of machine learning for computer architecture and systems. ACM Computing Surveys (CSUR) 55:1–39
Zhang L, Wu N, Ge F, Zhou F, Yahya MR (2020) A dynamic branch predictor based on parallel structure of srnn. IEEE Access 8:86230–86237
Mittal S (2019) A survey of techniques for dynamic branch prediction. Concurr Comput: Practice Exp 31(1):4666
Sburlan, A.-F.: Discovering predictive patterns: A study of contextual factors for next generation branch predictors. MEng Individual Project, Imperial College London, London (2023) Supervised by Prof. Paul Kelly and Dr, Giuliano Casale
Jiménez DA, Lin C (2001) Dynamic branch prediction with perceptrons. In: Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture, pp. 197–206. IEEE
McFarling S (1993) Combining branch predictors. Technical report, Citeseer (June
Tullsen DM, Eggers SJ, Levy HM (1995) Simultaneous multithreading: Maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 392–403
Yeh T-Y, Patt YN (1991) Two-level adaptive training branch prediction. In: Proceedings of the 24th Annual International Symposium on Microarchitecture, pp. 51–61
Seznec A, Michaud P (2006) A case for (partially) tagged geometric history length branch prediction. J Ins-Level Parallelism 8:23
Seznec A (2014) Tage-sc-l branch predictors. In: JILP-Championship Branch Prediction
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386
Lee C-C, Chen I-C, Mudge TN (1997) The bi-mode branch predictor. In: Proceedings of 30th Annual International Symposium on Microarchitecture, pp. 4–13. IEEE
Akkary H, Srinivasan ST, Koltur R, Patil Y, Refaai W (2004) Perceptron-based branch confidence estimation. In: 10th International Symposium on High Performance Computer Architecture (HPCA’04), pp. 265–265. IEEE
Hida I, Ikebe M, Asai T, Motomura M (2016) A 2-clock-cycle naïve bayes classifier for dynamic branch prediction in pipelined risc microprocessors. In: 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 297–300. https://doi.org/10.1109/APCCAS.2016.7803958
Tarsa SJ, Lin C-K, Keskin G, Chinya G, Wang H (2019) Improving branch prediction by modeling global history with convolutional neural networks. arXiv preprint arXiv:1906.09889
Ozturk C, Sendag R (2010) An analysis of hard to predict branches. IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 213–222
Zangeneh S, Pruett S, Lym S, Patt YN (2020) Branchnet: A convolutional neural network to predict hard-to-predict branches. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 118–130. IEEE
Seznec A (2016) Exploring branch predictability limits with the mtage+ sc predictor. In: 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5), p. 4
Zangeneh S, Pruett S, Patt Y (2020) Branch prediction with multilayer neural networks: The value of specialization. In: Machine Learning for Computer Architecture and Systems. National Science Foundation. NSF-PAR ID: 10249272
Mao Y, Huiyang Z, Gui X (2017) Exp deep neural net branch prediction. NC University, ECE Department
Zouzias A, Kalaitzidis K, Grot B (2021) Branch prediction as a reinforcement learning problem: Why, how and case studies. arXiv preprint arXiv:2106.13429
Villon LA, Susskind Z, Bacellar AT, Miranda ID, Araújo LS, Lima PM, Breternitz M Jr, John LK, França FM, Dutra DL (2023) A conditional branch predictor based on weightless neural networks. Neurocomputing 555:126637
Aleksander I, Thomas W, Bowden P (1984) Wisard· a radical step forward in image recognition. Sens Rev 4(3):120–124
Shkadarevich D (2020) Branch Prediction Dataset. https://www.kaggle.com/datasets/dmitryshkadarevich/branch-prediction
Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805[SPACE]https://arxiv.org/abs/1810.04805 1810.04805
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Bhargava P, Drozd A, Rogers A (2021) Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Turc I, Chang M, Lee K, Toutanova K (2019) Well-read students learn better: The impact of student initialization on knowledge distillation. CoRR abs/1908.08962[SPACE]https://arxiv.org/abs/1908.089621908.08962
Alajmi A. Anwaarma/BP-balanced. datasets at hugging face. https://huggingface.co/datasets/Anwaarma/BP-balanced
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Advances in neural information processing systems 30
Wang Y, Fan H, Li S, Liang T, Zhang W (2024) A modular branch predictor performance analysis framework for fast design space exploration. In: 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1–6. IEEE
Jamet AV, Vavouliotis G, Jiménez DA, Alvarez L, Casas M (2024) A two level neural approach combining off-chip prediction with adaptive prefetch filtering. In: 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 528–542. IEEE
Funding
The authors received no specific funding for this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alajmi, A., AlSarraf, B., Abualhassan, Z. et al. TinyBERT for branch prediction in modern microprocessors. Neural Comput & Applic 37, 1771–1782 (2025). https://doi.org/10.1007/s00521-024-10535-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-10535-1