Abstract
Virtualization can provide many benefits for managing resources, including higher resource utilization, lower energy cost, faster fault recovery, and more flexible resource provisioning. However, provisioning resources for applications in the cloud environment has been challenging, especially for scientific applications with more complex runtime behavior and higher performance demand. In this work, we use real scientific applications and performance benchmarking tools to analyze the application performance of our in-house virtualized cluster. We demonstrate that the performance degradation of virtualization can be less than 10% with proper virtual machine configuration and the support of hardware virtualized InfiniBand. Our study on four real scientific applications also proved that the application performance is difficult to model or predict. Therefore, we developed an auto-tuning tool for finding the best resource provisioning setting in terms of both time and cost for any given application. We evaluate our design on an in-house KVM-based virtualized cluster with an InfiniBand connection. Comparing an optimal result from an exhausting search, we verified that our auto-tuning tool achieves accuracy over 90%, comparing to the best deployment, by using much less tuning time and execution runs.
Similar content being viewed by others
Notes
MCX353A-FCBT Mellanox FDR InfiniBand card: http://www.mellanox.com/page/infiniband_cards_overview.
References
Beloglazov A, Buyya R, Lee YC, Zomaya A (2013) A taxonomy and survey of energy-efficient data centers and cloud computing systems. Int J Adv Res Comput Commun Eng (IJARCCE)
Xiao Z, Song W, Chen Q (2013) Dynamic resource allocation using virtual machines for cloud computing environment. IEEE TPDS 24(6):1107–1117
Amazon Web Services. https://aws.amazon.com/
Microsoft Azure. https://azure.microsoft.com/
Google Cloud Platform. https://cloud.google.com
Coghlan S, Yelick K (2011) The magellan final report on cloud computing. https://doi.org/10.2172/1076794. https://www.osti.gov/biblio/1076794
HPC Challenge Benchmark. http://icl.cs.utk.edu/hpcc/
Linux KVM. https://www.linux-kvm.org/page/Main_Page
Tillet P, Cox DD (2017) Input-aware auto-tuning of compute-bound HPC kernels. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 12 - 17, 2017. ACM. https://doi.org/10.1145/3126908.3126939
Guo Y, Shan H, Huang S, Hwang K, Fan J, Yu Z (2021) GML: efficiently auto-tuning flink’s configurations via guided machine learning. IEEE Trans Parallel Distrib Syst 32(12). doi: https://doi.org/10.1109/TPDS.2021.3081600
Shu T, Guo Y, Wozniak JM, Ding X, Foster IT, Kurç TM (2021) In-situ workflow auto-tuning through combining component models. In: PPoPP ’21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual Event, Republic of Korea, February 27- March 3, 2021. ACM. https://doi.org/10.1145/3437801.3441615
Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multim. Tools Appl. pp 8091–8126. doi: https://doi.org/10.1007/s11042-020-10139-6
Mock WBT (2011) Pareto Optimality. Springer Netherlands. https://doi.org/10.1007/978-1-4020-9160-5_341
Liu J (2012) Evaluating standard-based self-virtualizing devices: a performance study on 10 GbE NICs with SR-IOV support. Parall Distrib Comput Appl Technol (PDCAT)
Dong Y, Yang X, Li X, Li J, Tian K, Guan H (2010) High performance network virtualization with SR-IOV. In: 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 9-14 January 2010, Bangalore, India. IEEE Computer Society. doi: https://doi.org/10.1109/HPCA.2010.5416637
Suzuki J, Hidaka Y, Higuchi J, Baba T, Kami N, Yoshikawa T (2010) Multi-root share of single-root i/o virtualization (sr-iov) compliant pci express device. In: High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on, pp 25–31. https://doi.org/10.1109/HOTI.2010.21
Huang Z, Ma R, Li J, Chang Z, Guan H (2012) Adaptive and scalable optimizations for high performance sr-iov. In: Cluster Computing (CLUSTER), 2012 IEEE International Conference on, pp 459–467. https://doi.org/10.1109/CLUSTER.2012.28
Lockwood GK, Tatineni M, Wagner R (2014) Sr-iov: performance benefits for virtualized interconnects. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE ’14, pp 47:1–47:7
Jose J, Li M, Lu X, Kandalla K, Arnold M, Panda D (2013) Sr-iov support for virtualization on infiniband clusters: Early experience. In: Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on, pp 385–392
Hunt GDH, Pai R, Le MV, Jamjoom H, Bhattiprolu S, Boivie R, Dufour L, Frey B, Kapur M, Goldman KA, Grimm R, Janakirman J, Ludden JM, Mackerras P, May C, Palmer ER, Rao BB, Roy L, Starke WA, Stuecheli J, Valdez E, Voigt W (2021) Confidential computing for openpower. In: EuroSys ’21: Sixteenth European Conference on Computer Systems, Online Event, United Kingdom, April 26-28, 2021. ACM. https://doi.org/10.1145/3447786.3456243
Agache A, Ionescu M, Raiciu C (2017) CloudTalk: Enabling Distributed Application Optimisations in Public Clouds. In: Proceedings of the Twelfth European Conference on Computer Systems, EuroSys 2017, Belgrade, Serbia, April 23-26, pp. 605–619. ACM (2017). https://doi.org/10.1145/3064176.3064185
Akkus IE, Chen R, Rimac I, Stein M, Satzke K, Beck A, Aditya P, Hilt V (2018) SAND: Towards High-Performance Serverless Computing. In: 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11-13, 2018, pp. 923–935. USENIX Association. https://www.usenix.org/conference/atc18/presentation/akkus
National Center for High-performance Computing. https://iservice.nchc.org.tw/nchc_service/index.php?lang_type=
Chameleon Cloud. https://chameleoncloud.readthedocs.io/en/latest/
Yelick K, Coghlan S, Draney B, Canon RS (2011) The magellan report on cloud computing for science
Zhai Y, Liu M, Zhai J, Ma X, Chen W (2011) Cloud versus in-house cluster: Evaluating amazon cluster compute instances for running mpi applications. In: State of the Practice Reports, SC ’11, pp 11:1–11:10. ACM, New York, NY, USA. https://doi.org/10.1145/2063348.2063363
He Q, Zhou S, Kobler B, Duffy D, McGlynn T (2010) Case study for running hpc applications in public clouds. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, pp 395–401
Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, CLOUDCOM ’10, pp 159–168
Thomas S, Voelker GM, Porter G (2018) Cachecloud: Towards speed-of-light datacenter communication. In: Ananthanarayanan G, Gupta I (eds.) 10th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2018, Boston, MA, USA, July 9, 2018. USENIX Association. https://www.usenix.org/conference/hotcloud18/presentation/thomas
Azure M (2021) High-performance computing on InfiniBand enabled H-series and N-series VMs. https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/hpc/overview
Services AW (2019) Leveraging Elastic Fabric Adapter to run HPC and ML Workloads on AWS Batch. https://aws.amazon.com/tw/blogs/compute/leveraging-efa-to-run-hpc-and-ml-workloads-on-aws-batch/
Herodotou H, Chen Y, Lu J (2020) A survey on automatic parameter tuning for big data processing systems. ACM Comput Surv 53(2). https://doi.org/10.1145/3381027
HadoopTuning: [Online]. Available: http://hadooptutorial.info/ hadoop-performance-tuning/ (2015)
Verma A, Cherkasova L, Campbell RH (2011) ARIA: Automatic Resource Inference and Allocation for Mapreduce Environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp 235–244
Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: In CIDR, pp 261–272
Zhang Z, Cherkasova L, Loo BT (2013) Autotune: Optimizing execution concurrency and resource usage in mapreduce workflows. In: Proceedings of the 10th International Conference on Autonomic Computing, pp 175–181. USENIX. https://www.usenix.org/conference/icac13/technical-sessions/presentation/zhang_zhuoyao
Bei Z, Yu Z, Zhang H, Xiong W, Xu C, Eeckhout L, Feng S (2016) Rfhoc: a random-forest approach to auto-tuning hadoop’s configuration. IEEE Trans Parallel Distrib Syst 27(5):1470–1483. https://doi.org/10.1109/TPDS.2015.2449299
Li M, Zeng L, Meng S, Tan J, Zhang L, Butt AR, Fuller N (2014) MRONLINE: MapReduce Online Performance Tuning. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’14, pp 165–176. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2600212.2600229
Lolos K, Konstantinou I, Kantere V, Koziris N (2017) Elastic management of cloud applications using adaptive reinforcement learning. In: 2017 IEEE International Conference on Big Data (Big Data), pp 203–212. https://doi.org/10.1109/BigData.2017.8257928
Nouri SMR, Li H, Venugopal S, Guo W, He M, Tian W (2019) Autonomic decentralized elasticity based on a reinforcement learning controller for cloud applications. Future Gener Comput Syst 94:765–780 https://doi.org/10.1016/j.future.2018.11.049. https://www.sciencedirect.com/science/article/pii/S0167739X18302826
Jamshidi P, Sharifloo A, Pahl C, Arabnejad H, Metzger A, Estrada G (2016) Fuzzy self-learning controllers for elasticity management in dynamic cloud architectures. In: 2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA), pp. 70–79. https://doi.org/10.1109/QoSA.2016.13
Arabnejad H, Jamshidi P, Estrada G, El Ioini N, Pahl C (2016) An auto-scaling cloud controller using fuzzy q-learning - implementation in openstack. In: Aiello M, Johnsen EB, Dustdar S, Georgievski I (eds) Service-Oriented and Cloud Computing. Springer International Publishing, Cham, pp 152–167
Hanafy WA, Mohamed AE, Salem SA (2019) A new infrastructure elasticity control algorithm for containerized cloud. IEEE Access 7:39731–39741. https://doi.org/10.1109/ACCESS.2019.2907171
Chen T, Moreau T, Jiang Z, Zheng L, Yan E, Shen H, Cowan M, Wang L, Hu Y, Ceze L, Guestrin C, Krishnamurthy A (2018) TVM: an automated end-to-end optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp 578–594. USENIX Association, Carlsbad, CA. https://www.usenix.org/conference/osdi18/presentation/chen
Mahgoub A, Wood P, Ganesh S, Mitra S, Gerlach W, Harrison T, Meyer F, Grama A, Bagchi S, Chaterji S (2017) Rafiki: A middleware for parameter tuning of nosql datastores for dynamic metagenomics workloads. In: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Middleware ’17, pp 28–40. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3135974.3135991
Roy RB, Patel T, Gadepally V, Tiwari D (2021) Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models. In: PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 20211. ACM. https://doi.org/10.1145/3453483.3454109
Ghosh R, Ghosh M, Yearwood J, Bagirov A (2005) Comparative analysis of genetic algorithm, simulated annealing and cutting angle method for artificial neural networks. In: Perner P, Imiya A (eds) Machine learning and data mining in pattern recognition. Springer, Berlin, Heidelberg, pp 62–70
Russell R (2008) Virtio: towards a de-facto standard for virtual i/o devices. SIGOPS Oper Syst Rev 42(5):95–103
Overview of Single Root I/O Virtualization (SR-IOV). https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-
HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http://www.netlib.org/benchmark/hpl/
Parkbench Matrix Kernel Benchmarks. http://www.netlib.org/parkbench/html/matrix-kernels.html
IOR HPC Benchmark. http://sourceforge.net/projects/ior-sio/
Gadget2. Gadget2,http://www.mpa-garching.mpg.de/gadget/
WRF: The Weather Research & Forecasting Model. http://www.wrf-model.org/
libvirt: The virtualization API. https://libvirt.org
Run commands on your Linux instance at launch. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
Chen CC, Hasio YT, Lin CY, Lu S, Lu HT, Chou J (2017) Using deep learning to predict and optimize hadoop data analytic service in a cloud platform. In: 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence and Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), pp 909–916. https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.153
Jordan H, Thoman P, Durillo JJ, Gschwandtner SPP, Fahringer T, Moritsch H (2012) A multi-objective autotuning framework for parallel codes. In: SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp 1–12, https://doi.org/10.1109/SC.2012.7
Kessaci Y, Melab N, Talbi EG (2011) A pareto-based ga for scheduling hpc applications on distributed cloud infrastructures. IEEE HPCS
Source code of stdlib in C lang. http://www.jbox.dk/sanos/source/lib/stdlib.c.html
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hsu, KJ., Chou, J. Performance benchmarking and auto-tuning for scientific applications on virtual cluster. J Supercomput 78, 6174–6206 (2022). https://doi.org/10.1007/s11227-021-04103-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04103-w