Abstract
Modern CPUs not only have multiple cores but also support wide single instruction multiple data (SIMD). This trend is expected to grow in the future. In this paper, we examine the effect of the vector length and the number of out-of-order resources on the performance and the power consumption of programs having multiple vector lengths using the Arm Scalable Vector Extension. Based on the performed evaluation, we conclude that using a longer vector length with multicycle vector units leads to up to approximately 30% improvement in performance and 21% decrease in power consumption than when using a shorter vector length.
Similar content being viewed by others
References
Stephens N (2016) ARMv8-A next-generation vector architecture for HPC. In: 2016 IEEE Hot Chips 28 Symposium (HCS), pp 1–31
Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N, Reid A, Rico A, Walker P (2017) The ARM scalable vector extension. IEEE Micro 37(2):26–39
Brash D, Stephens N (2017) ARM: scaling new heights. In: COOL Chips 20
Tairum Cruz M (2018) Performing SVE studies using the arm instruction emulator. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp 638–638
github gem5 simulator. https://github.com/gem5/gem5
The gem5 Simulator—a modular platform for computer-system architecture research. http://gem5.org/
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The Gem5 simulator. ACM SIGARCH Comput Arch News 39(2):1–7
Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 469–480
Yuetsu K, Tetsuya O, Akira A, Mitsuhisa S (2019) Evaluation of the RIKEN post-k processor simulator. arXiv:1904.06451
ThunderX2 Arm-based Processors. https://www.marvell.com/products/server-processors/thunderx2-arm-processors.html
Hammond SD, Hughes C, Levenhagen MJ, Vaughan CT, Younge AJ, Schwaller B, Aguilar MJ, Pedretti KT, Laros JH (2019) Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads. In: 2019 International Conference on High Performance Computing Simulation (HPCS), pp 416–423
Yoshida T (2018) Fujitsu high performance CPU for the post-K computer. In: Hot Chips: A Symposium on High Performance Chips (HC30)
Rico Al, Joao JA, Adeniyi-Jones C, Van Hensbergen E (2017) ARM HPC ecosystem and the reemergence of vectors: invited paper. In: Proceedings of the Computing Frontiers Conference, CF’17, pp 329–334, New York, NY, USA. Association for Computing Machinery
Poenaru A, McIntosh-Smith S (2020) Evaluating the effectiveness of a vector-length-agnostic instruction set. In: Euro-Par 2020: Parallel Processing, pp 98–114. Springer International Publishing
Naffziger S, Lepak K, Paraschou M, Subramony M (2020) 2.2 AMD Chiplet architecture for high-performance server and desktop products. In: 2020 IEEE International Solid-State Circuits Conference—ISSCC), pp 44–45
Hisamoto D, Lee W-C, Kedzierski J, Takeuchi H, Asano K, Kuo C, Anderson E, King T-J, Bokor J, Hu C (2000) FinFET-a self-aligned double-gate MOSFET scalable to 20 nm. IEEE Trans Electron Devices 47(12):2320–2325
Kuhn KJ (2012) Considerations for ultimate CMOS scaling. IEEE Trans Electron Devices 59(7):1813–1828
Gem5 to McPAT parser. https://github.com/Dhruv-Acharya/Gem5ToMcPAT-Parser
Kodama Y, Odajima T, Matsuda M, Tsuji M, Lee J, Sato M (2017) Preliminary performance evaluation of application kernels using ARM SVE with multiple vector lengths. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp 677–684
Arm Instruction Emulator. https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator
Endo FA, Couroussé D, Charles H (2014) Micro-architectural simulation of in-order and out-of-order ARM microprocessors with gem5. In: 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), pp 266–273
Shao YS, Xi SL, Srinivasan V, Wei G, Brooks D (2016) Co-designing accelerators and SoC interfaces using gem5-Aladdin. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 1–12
Lim J, Lakshminarayana NB, Kim H, Song W, Yalamanchili S, Sung W (2014) Power modeling for GPU architectures using McPAT. ACM Trans Des Autom Electron Syst 19(3):1–24
Endo FA, Couroussé D, Charles H-P (2015) Micro-architectural simulation of embedded core heterogeneity with Gem5 and McPAT. In: Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO’15, New York, NY, USA, Association for Computing Machinery
Inoue H (2016) How SIMD width affects energy efficiency: a case study on sorting. In: 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX), pp 1–3
Inoue H (2017) Energy efficiency effects of vectorization in data reuse transformations for many-core processors—a case study. J Low Power Electron Appl 7(1):1–21
Acknowledgements
This work is partially funded by MEXT’s program for the Development and Improvement for the Next Generation Ultra High-Speed Computer System, under its Subsidies for Operating the Specific Advanced Large Research Facilities.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Odajima, T., Kodama, Y. & Sato, M. Performance and power consumption analysis of Arm Scalable Vector Extension. J Supercomput 77, 5757–5778 (2021). https://doi.org/10.1007/s11227-020-03495-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03495-5