[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

TinyBERT for branch prediction in modern microprocessors

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recent progress has highlighted the crucial importance of branch prediction (BP) in optimizing computer performance, especially in reducing computational delays by preventing stalls in modern microprocessors (also known as CPUs). In this paper, we investigate the use of machine learning (ML) models to improve BP accuracy, focusing on the capabilities of transformer models based on their exceptional predictive and classification performance. Although existing studies have employed various ML methods for BP, their selected models are computationally expensive and impractical for such task. Hence, we present an advanced ML-based dynamic BP technique utilizing tiny bidirectional encoder representations from transformers (TinyBERT), notable for its efficiency, simplicity, and low resource utilization. This method not only streamlines the BP process but also offers a more effective alternative to conventional strategies. A key aspect of our approach is the application of local post hoc explanations, which provide deep insights into the model’s predictive actions. Our empirical findings reveal that this methodology secures a substantial 13% reduction in the rate of mispredictions compared to top predictors like TAGE-SC-L, across various multimedia and integer application benchmarks. These results underscore the potential of using compact transformers in establishing significant criteria for efficient and effective BP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data used to support the findings of this study are available upon request.

References

  1. Sambo MK (2023) A comparative study of pipelining, branch prediction, and superscalar architectures for enhanced computer performance. Computer Science

  2. Young C, Gloy N, Smith MD (1995) A comparative analysis of schemes for correlated branch prediction. ACM SIGARCH Computer Arch News 23(2):276–286

    MATH  Google Scholar 

  3. Lin C-K, Tarsa SJ (2019) Branch prediction is not a solved problem: Measurements, opportunities, and future directions. arXiv preprint

  4. Sbera M, Vintan LN, Florea A (2001) Static and dynamic branch prediction using neural networks. Computer Science

  5. Choi H, Park S (2021) A survey of machine learning-based system performance optimization techniques. Appl Sci 11(7):3235

    MATH  Google Scholar 

  6. Fu JW, Patel JH, Janssens BL (1992) Stride directed prefetching in scalar processors. ACM SIGMICRO Newsletter 23(1–2):102–110

    Google Scholar 

  7. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444

    Google Scholar 

  8. Joseph R (2021) A survey of deep learning techniques for dynamic branch prediction. arXiv preprint arXiv:2112.14911

  9. Wu N, Xie Y (2021) A survey of machine learning for computer architecture and systems. ACM Computing Surveys (CSUR) 55:1–39

    MATH  Google Scholar 

  10. Zhang L, Wu N, Ge F, Zhou F, Yahya MR (2020) A dynamic branch predictor based on parallel structure of srnn. IEEE Access 8:86230–86237

    Google Scholar 

  11. Mittal S (2019) A survey of techniques for dynamic branch prediction. Concurr Comput: Practice Exp 31(1):4666

    MATH  Google Scholar 

  12. Sburlan, A.-F.: Discovering predictive patterns: A study of contextual factors for next generation branch predictors. MEng Individual Project, Imperial College London, London (2023) Supervised by Prof. Paul Kelly and Dr, Giuliano Casale

    Google Scholar 

  13. Jiménez DA, Lin C (2001) Dynamic branch prediction with perceptrons. In: Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture, pp. 197–206. IEEE

  14. McFarling S (1993) Combining branch predictors. Technical report, Citeseer (June

    MATH  Google Scholar 

  15. Tullsen DM, Eggers SJ, Levy HM (1995) Simultaneous multithreading: Maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 392–403

  16. Yeh T-Y, Patt YN (1991) Two-level adaptive training branch prediction. In: Proceedings of the 24th Annual International Symposium on Microarchitecture, pp. 51–61

  17. Seznec A, Michaud P (2006) A case for (partially) tagged geometric history length branch prediction. J Ins-Level Parallelism 8:23

    MATH  Google Scholar 

  18. Seznec A (2014) Tage-sc-l branch predictors. In: JILP-Championship Branch Prediction

  19. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386

    MATH  Google Scholar 

  20. Lee C-C, Chen I-C, Mudge TN (1997) The bi-mode branch predictor. In: Proceedings of 30th Annual International Symposium on Microarchitecture, pp. 4–13. IEEE

  21. Akkary H, Srinivasan ST, Koltur R, Patil Y, Refaai W (2004) Perceptron-based branch confidence estimation. In: 10th International Symposium on High Performance Computer Architecture (HPCA’04), pp. 265–265. IEEE

  22. Hida I, Ikebe M, Asai T, Motomura M (2016) A 2-clock-cycle naïve bayes classifier for dynamic branch prediction in pipelined risc microprocessors. In: 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 297–300. https://doi.org/10.1109/APCCAS.2016.7803958

  23. Tarsa SJ, Lin C-K, Keskin G, Chinya G, Wang H (2019) Improving branch prediction by modeling global history with convolutional neural networks. arXiv preprint arXiv:1906.09889

  24. Ozturk C, Sendag R (2010) An analysis of hard to predict branches. IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 213–222

  25. Zangeneh S, Pruett S, Lym S, Patt YN (2020) Branchnet: A convolutional neural network to predict hard-to-predict branches. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 118–130. IEEE

  26. Seznec A (2016) Exploring branch predictability limits with the mtage+ sc predictor. In: 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5), p. 4

  27. Zangeneh S, Pruett S, Patt Y (2020) Branch prediction with multilayer neural networks: The value of specialization. In: Machine Learning for Computer Architecture and Systems. National Science Foundation. NSF-PAR ID: 10249272

  28. Mao Y, Huiyang Z, Gui X (2017) Exp deep neural net branch prediction. NC University, ECE Department

    Google Scholar 

  29. Zouzias A, Kalaitzidis K, Grot B (2021) Branch prediction as a reinforcement learning problem: Why, how and case studies. arXiv preprint arXiv:2106.13429

  30. Villon LA, Susskind Z, Bacellar AT, Miranda ID, Araújo LS, Lima PM, Breternitz M Jr, John LK, França FM, Dutra DL (2023) A conditional branch predictor based on weightless neural networks. Neurocomputing 555:126637

    Google Scholar 

  31. Aleksander I, Thomas W, Bowden P (1984) Wisard· a radical step forward in image recognition. Sens Rev 4(3):120–124

    MATH  Google Scholar 

  32. Shkadarevich D (2020) Branch Prediction Dataset. https://www.kaggle.com/datasets/dmitryshkadarevich/branch-prediction

  33. Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805[SPACE]https://arxiv.org/abs/1810.04805 1810.04805

  34. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  35. Bhargava P, Drozd A, Rogers A (2021) Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

  36. Turc I, Chang M, Lee K, Toutanova K (2019) Well-read students learn better: The impact of student initialization on knowledge distillation. CoRR abs/1908.08962[SPACE]https://arxiv.org/abs/1908.089621908.08962

  37. Alajmi A. Anwaarma/BP-balanced. datasets at hugging face. https://huggingface.co/datasets/Anwaarma/BP-balanced

  38. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Advances in neural information processing systems 30

  39. Wang Y, Fan H, Li S, Liang T, Zhang W (2024) A modular branch predictor performance analysis framework for fast design space exploration. In: 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1–6. IEEE

  40. Jamet AV, Vavouliotis G, Jiménez DA, Alvarez L, Casas M (2024) A two level neural approach combining off-chip prediction with adaptive prefetch filtering. In: 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 528–542. IEEE

Download references

Funding

The authors received no specific funding for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abbas A. Fairouz.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alajmi, A., AlSarraf, B., Abualhassan, Z. et al. TinyBERT for branch prediction in modern microprocessors. Neural Comput & Applic 37, 1771–1782 (2025). https://doi.org/10.1007/s00521-024-10535-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-10535-1

Keywords