[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Performance and power consumption analysis of Arm Scalable Vector Extension

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Modern CPUs not only have multiple cores but also support wide single instruction multiple data (SIMD). This trend is expected to grow in the future. In this paper, we examine the effect of the vector length and the number of out-of-order resources on the performance and the power consumption of programs having multiple vector lengths using the Arm Scalable Vector Extension. Based on the performed evaluation, we conclude that using a longer vector length with multicycle vector units leads to up to approximately 30% improvement in performance and 21% decrease in power consumption than when using a shorter vector length.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Stephens N (2016) ARMv8-A next-generation vector architecture for HPC. In: 2016 IEEE Hot Chips 28 Symposium (HCS), pp 1–31

  2. Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N, Reid A, Rico A, Walker P (2017) The ARM scalable vector extension. IEEE Micro 37(2):26–39

    Article  Google Scholar 

  3. Brash D, Stephens N (2017) ARM: scaling new heights. In: COOL Chips 20

  4. Tairum Cruz M (2018) Performing SVE studies using the arm instruction emulator. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp 638–638

  5. github gem5 simulator. https://github.com/gem5/gem5

  6. The gem5 Simulator—a modular platform for computer-system architecture research. http://gem5.org/

  7. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The Gem5 simulator. ACM SIGARCH Comput Arch News 39(2):1–7

    Article  Google Scholar 

  8. Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 469–480

  9. Yuetsu K, Tetsuya O, Akira A, Mitsuhisa S (2019) Evaluation of the RIKEN post-k processor simulator. arXiv:1904.06451

  10. ThunderX2 Arm-based Processors. https://www.marvell.com/products/server-processors/thunderx2-arm-processors.html

  11. Hammond SD, Hughes C, Levenhagen MJ, Vaughan CT, Younge AJ, Schwaller B, Aguilar MJ, Pedretti KT, Laros JH (2019) Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads. In: 2019 International Conference on High Performance Computing Simulation (HPCS), pp 416–423

  12. Yoshida T (2018) Fujitsu high performance CPU for the post-K computer. In: Hot Chips: A Symposium on High Performance Chips (HC30)

  13. Rico Al, Joao JA, Adeniyi-Jones C, Van Hensbergen E (2017) ARM HPC ecosystem and the reemergence of vectors: invited paper. In: Proceedings of the Computing Frontiers Conference, CF’17, pp 329–334, New York, NY, USA. Association for Computing Machinery

  14. Poenaru A, McIntosh-Smith S (2020) Evaluating the effectiveness of a vector-length-agnostic instruction set. In: Euro-Par 2020: Parallel Processing, pp 98–114. Springer International Publishing

  15. Naffziger S, Lepak K, Paraschou M, Subramony M (2020) 2.2 AMD Chiplet architecture for high-performance server and desktop products. In: 2020 IEEE International Solid-State Circuits Conference—ISSCC), pp 44–45

  16. Hisamoto D, Lee W-C, Kedzierski J, Takeuchi H, Asano K, Kuo C, Anderson E, King T-J, Bokor J, Hu C (2000) FinFET-a self-aligned double-gate MOSFET scalable to 20 nm. IEEE Trans Electron Devices 47(12):2320–2325

    Article  Google Scholar 

  17. Kuhn KJ (2012) Considerations for ultimate CMOS scaling. IEEE Trans Electron Devices 59(7):1813–1828

    Article  Google Scholar 

  18. Gem5 to McPAT parser. https://github.com/Dhruv-Acharya/Gem5ToMcPAT-Parser

  19. Kodama Y, Odajima T, Matsuda M, Tsuji M, Lee J, Sato M (2017) Preliminary performance evaluation of application kernels using ARM SVE with multiple vector lengths. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp 677–684

  20. Arm Instruction Emulator. https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator

  21. Endo FA, Couroussé D, Charles H (2014) Micro-architectural simulation of in-order and out-of-order ARM microprocessors with gem5. In: 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), pp 266–273

  22. Shao YS, Xi SL, Srinivasan V, Wei G, Brooks D (2016) Co-designing accelerators and SoC interfaces using gem5-Aladdin. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 1–12

  23. Lim J, Lakshminarayana NB, Kim H, Song W, Yalamanchili S, Sung W (2014) Power modeling for GPU architectures using McPAT. ACM Trans Des Autom Electron Syst 19(3):1–24

    Article  Google Scholar 

  24. Endo FA, Couroussé D, Charles H-P (2015) Micro-architectural simulation of embedded core heterogeneity with Gem5 and McPAT. In: Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO’15, New York, NY, USA, Association for Computing Machinery

  25. Inoue H (2016) How SIMD width affects energy efficiency: a case study on sorting. In: 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX), pp 1–3

  26. Inoue H (2017) Energy efficiency effects of vectorization in data reuse transformations for many-core processors—a case study. J Low Power Electron Appl 7(1):1–21

    Article  Google Scholar 

Download references

Acknowledgements

This work is partially funded by MEXT’s program for the Development and Improvement for the Next Generation Ultra High-Speed Computer System, under its Subsidies for Operating the Specific Advanced Large Research Facilities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetsuya Odajima.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Odajima, T., Kodama, Y. & Sato, M. Performance and power consumption analysis of Arm Scalable Vector Extension. J Supercomput 77, 5757–5778 (2021). https://doi.org/10.1007/s11227-020-03495-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03495-5

Keywords

Navigation