More Web Proxy on the site http://driver.im/

research-article

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters

Authors:

Lavanya Subramanian,

Onur MutluAuthors Info & Claims

ACM SIGPLAN Notices, Volume 50, Issue 7

Pages 93 - 106

https://doi.org/10.1145/2817817.2731202

Published: 14 March 2015 Publication History

Abstract

Virtualization technologies has been widely adopted by large-scale cloud computing platforms. These virtualized systems employ distributed resource management (DRM) to achieve high resource utilization and energy savings by dynamically migrating and consolidating virtual machines. DRM schemes usually use operating-system-level metrics, such as CPU utilization, memory capacity demand and I/O utilization, to detect and balance resource contention. However, they are oblivious to microarchitecture-level resource interference (e.g., memory bandwidth contention between different VMs running on a host), which is currently not exposed to the operating system.

We observe that the lack of visibility into microarchitecture-level resource interference significantly impacts the performance of virtualized systems. Motivated by this observation, we propose a novel architecture-aware DRM scheme (ADRM), that takes into account microarchitecture-level resource interference when making migration decisions in a virtualized cluster. ADRM makes use of three core techniques: 1) a profiler to monitor the microarchitecture-level resource usage behavior online for each physical host, 2) a memory bandwidth interference model to assess the interference degree among virtual machines on a host, and 3) a cost-benefit analysis to determine a candidate virtual machine and a host for migration.

Real system experiments on thirty randomly selected combinations of applications from the CPU2006, PARSEC, STREAM, NAS Parallel Benchmark suites in a four-host virtualized cluster show that ADRM can improve performance by up to 26.55%, with an average of 9.67%, compared to traditional DRM schemes that lack visibility into microarchitecture-level resource utilization and contention.

References

[1]

Windows Azure. http://www.windowsazure.com/en-un/.

[2]

Amazon EC2. http://aws.amazon.com/ec2/.

[3]

libvirt: The virtualization API. http://libvirt.org.

[4]

NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html.

[5]

QEMU. http://qemu.org.

[6]

SPEC CPU2006. http://www.spec.org/spec2006.

[7]

STREAM Benchmark. http://www.streambench.org/.

[8]

J. Ahn, C. Kim, J. Han, Y.-R. Choi, and J. Huh. Dynamic virtual machine scheduling in clouds for architectural shared resources. In HotCloud, 2012.

Digital Library

[9]

M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the problems and opportunities posed by multiple on-chip memory controllers. In PACT, 2010.

Digital Library

[10]

N. Beckmann, P.-A. Tsai, and D. Sanchez. Scaling dis- tributed cache hierarchies through computation and data co- scheduling. In HPCA, 2015.

[11]

C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.

Digital Library

[12]

S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova. A case for NUMA-aware contention management on multicore systems. In USENIX ATC, 2011.

Digital Library

[13]

K. K. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu. HAT: heterogeneous adaptive throttling for on-chip networks. In SBAC-PAD, 2012.

Digital Library

[14]

S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006.

Digital Library

[15]

C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In NSDI, 2005.

Digital Library

[16]

R. Das, O. Mutlu, T. Moscibroda, and C. Das. Application- aware prioritization mechanisms for on-chip networks. In MICRO, 2009.

Digital Library

[17]

R. Das, O. Mutlu, T. Moscibroda, and C. R. Das. Aérgia: exploiting packet latency slack in on-chip networks. In ISCA, 2010.

Digital Library

[18]

R. Das, R. Ausavarungnirun, O. Mutlu, A. Kumar, and M. Azimi. Application-to-core mapping policies to reduce memory system interference in multi-core systems. In HPCA, 2013.

Digital Library

[19]

M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on NUMA systems. In ASPLOS, 2013.

Digital Library

[20]

E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS, 2010.

Digital Library

[21]

D. Eklov, N. Nikoleris, D. Black-Schaffer, and E. Hagersten. Bandwidth Bandit: Quantitative characterization of memory contention. In PACT, 2012.

Digital Library

[22]

S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, (3), 2008.

Digital Library

[23]

D. Gmach, J. Rolia, L. Cherkasova, G. Belrose, T. Turicchi, and A. Kemper. An integrated approach to resource pool management: Policies, efficiency and quality metrics. In DSN, 2008.

[24]

S. Govindan, J. Liu, A. Kansal, and A. Sivasubramaniam. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In SoCC, 2011.

Digital Library

[25]

B. Grot, S. W. Keckler, and O. Mutlu. Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip. In MICRO, 2009.

Digital Library

[26]

B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. In ISCA, 2011.

Digital Library

[27]

A. Gulati, I. Ahmad, and C. A. Waldspurger. PARDA: Proportional allocation of resources for distributed storage access. In FAST, 2009.

Digital Library

[28]

A. Gulati, C. Kumar, I. Ahmad, and K. Kumar. BASIL: Automated IO load balancing across storage devices. In FAST, 2010.

Digital Library

[29]

A. Gulati, A. Merchant, and P. J. Varman. mClock: Handling throughput variability for hypervisor IO scheduling. In OSDI, 2010.

Digital Library

[30]

A. Gulati, G. Shanmuganathan, I. Ahmad, C. Waldspurger, and M. Uysal. Pesto: Online storage performance management in virtualized datacenters. In SoCC, 2011.

Digital Library

[31]

A. Gulati, A. Holler, M. Ji, G. Shanmuganathan, C. Waldspurger, and X. Zhu. VMware distributed resource management: Design, implementation, and lessons learned. VMware Technical Journal, 1(1):45--64, 2012.

[32]

Intel. Performance Analysis Guide for Intel Core i7 Processor and Intel Xeon 5500 processors.

[33]

Intel. An Introduction to the Intel QuickPath Interconnect, 2009.

[34]

C. Isci, J. Hanson, I. Whalley, M. Steinder, and J. Kephart. Runtime demand estimation for effective dynamic resource management. In NOMS, 2010.

[35]

C. Isci, J. Liu, B. Abali, J. Kephart, and J. Kouloheris. Improving server utilization using fast virtual machine migration. IBM Journal of Research and Development, 55 (6), Nov 2011.

Digital Library

[36]

R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004.

Digital Library

[37]

M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. Measuring interference between live datacenter applications. In SC, 2012.

Digital Library

[38]

O. Kayiran, N. C. Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das. Managing GPU concurrency in heterogeneous architectures. In MICRO, 2014.

Digital Library

[39]

H. Kim, D. de Niz, B. Andersson, M. H. Klein, O. Mutlu, and R. Rajkumar. Bounding memory interference delay in cots-based multi-core systems. In RTAS, 2014.

[40]

S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.

Digital Library

[41]

Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.

[42]

Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010.

Digital Library

[43]

A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori. kvm: the Linux Virtual Machine Monitor. In Proceedings of the Linux Symposium, volume 1, 2007.

[44]

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, 2008.

[45]

M. Liu and T. Li. Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads. In ISCA, 2014.

Digital Library

[46]

J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In MICRO, 2011.

Digital Library

[47]

T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007.

Digital Library

[48]

T. Moscibroda and O. Mutlu. Distributed order scheduling and its application to multi-core DRAM controllers. In PODC, 2008.

Digital Library

[49]

S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In MICRO, 2011.

Digital Library

[50]

O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007.

Digital Library

[51]

O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA, 2008.

Digital Library

[52]

R. Nathuji, A. Kansal, and A. Ghaffarkhah. Q-clouds: Managing performance interference effects for QoS-aware clouds. In EuroSys, 2010.

Digital Library

[53]

M. Nelson, B.-H. Lim, and G. Hutchins. Fast transparent migration for virtual machines. In USENIX ATC, 2005.

Digital Library

[54]

G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. Next generation on-chip networks: What kind of congestion control do we need? In HotNets, 2010.

Digital Library

[55]

G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. On-chip networks from a networking perspective: Congestion and scalability in many-core interconnects. In SIGCOMM, 2012.

Digital Library

[56]

P. Padala, K.-Y. Hou, K. G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, and A. Merchant. Automated control of multiple virtualized resources. In EuroSys, 2009.

Digital Library

[57]

M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006.

Digital Library

[58]

M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA, 2007.

Digital Library

[59]

J. Rao and X. Zhou. Towards fair and efficient SMP virtual machine scheduling. In PPoPP, 2014.

Digital Library

[60]

J. Rao, K. Wang, X. Zhou, and C.-Z. Xu. Optimizing virtual machine scheduling in NUMA multicore systems. In HPCA, 2013.

Digital Library

[61]

V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In PACT, 2012.

Digital Library

[62]

A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000.

Digital Library

[63]

L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu. MISE: Providing performance predictability and improving fairness in shared main memory systems. In HPCA, 2013.

Digital Library

[64]

L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, and O. Mutlu. The blacklisting memory scheduler: Achieving high performance and fairness at low cost. In ICCD, 2014.

[65]

G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. Journal of Supercomputing, 28(1), 2004.

Digital Library

[66]

L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, 2011.

Digital Library

[67]

L. Tang, J. Mars, and M. L. Soffa. Compiling for niceness: Mitigating contention for QoS in warehouse scale computers. In CGO, 2012.

Digital Library

[68]

A. Tumanov, J. Wise, O. Mutlu, and G. R. Ganger. Asymmetry-aware execution placement on manycore chips. In SFMA, 2013.

[69]

H. Vandierendonck and A. Seznec. Fairness metrics for multi-threaded processors. IEEE CAL, February 2011.

Digital Library

[70]

C. A. Waldspurger. Memory resource management in VMware ESX server. In OSDI, 2002.

Digital Library

[71]

C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic adaptive scheduling for virtual machines. In HPDC, 2011.

Digital Library

[72]

T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Black- box and gray-box strategies for virtual machine migration. In NSDI, 2007.

Digital Library

[73]

Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo- partitioning of multi-core shared caches. In ISCA, 2009.

Digital Library

[74]

H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In ISCA, 2013.

Digital Library

[75]

K. Ye, Z. Wu, C. Wang, B. Zhou, W. Si, X. Jiang, and A. Zomaya. Profiling-based workload consolidation and migration in virtualized data centres. TPDS, 2014.

[76]

S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010

Digital Library

Cited By

Hoseiny Farahabady MZomaya A(2024)Geo-Distributed Analytical Streaming Architecture for IoT Platforms2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00030(263-274)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00030
HoseinyFarahabady MTaheri JZomaya ATari Z(2022)Low Latency Execution Guarantee Under Uncertainty in Serverless PlatformsParallel and Distributed Computing, Applications and Technologies10.1007/978-3-030-96772-7_30(324-335)Online publication date: 16-Mar-2022
https://doi.org/10.1007/978-3-030-96772-7_30
HoseinyFarahabady MTaheri JZomaya ATari Z(2021)QSpark: Distributed Execution of Batch & Streaming Analytics in Spark Platform2021 IEEE 20th International Symposium on Network Computing and Applications (NCA)10.1109/NCA53618.2021.9685833(1-8)Online publication date: 23-Nov-2021
https://doi.org/10.1109/NCA53618.2021.9685833
Show More Cited By

Index Terms

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Software development methods

Recommendations

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters
VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Virtualization technologies has been widely adopted by large-scale cloud computing platforms. These virtualized systems employ distributed resource management (DRM) to achieve high resource utilization and energy savings by dynamically migrating and ...
SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...
SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 50, Issue 7

VEE '15

July 2015

221 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2817817

Editor:
Andy Gill
University of Kansas, Lawrence, KS

Issue’s Table of Contents

VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
March 2015
238 pages
ISBN:9781450334501
DOI:10.1145/2731186
General Chair:
Ada Gavrilovska
Georgia Tech
,
Program Chairs:
Angela Demke Brown
University of Toronto
,
Bjarne Steensgaard
Microsoft

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2015

Published in SIGPLAN Volume 50, Issue 7

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National High-tech R&D Program of China (863)
China Scholarship Council
National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
498
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hoseiny Farahabady MZomaya A(2024)Geo-Distributed Analytical Streaming Architecture for IoT Platforms2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00030(263-274)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00030
HoseinyFarahabady MTaheri JZomaya ATari Z(2022)Low Latency Execution Guarantee Under Uncertainty in Serverless PlatformsParallel and Distributed Computing, Applications and Technologies10.1007/978-3-030-96772-7_30(324-335)Online publication date: 16-Mar-2022
https://doi.org/10.1007/978-3-030-96772-7_30
HoseinyFarahabady MTaheri JZomaya ATari Z(2021)QSpark: Distributed Execution of Batch & Streaming Analytics in Spark Platform2021 IEEE 20th International Symposium on Network Computing and Applications (NCA)10.1109/NCA53618.2021.9685833(1-8)Online publication date: 23-Nov-2021
https://doi.org/10.1109/NCA53618.2021.9685833
HoseinyFarahabady MTaheri JZomaya ATari Z(2021)Graceful Performance Degradation in Apache StormParallel and Distributed Computing, Applications and Technologies10.1007/978-3-030-69244-5_35(389-400)Online publication date: 21-Feb-2021
https://doi.org/10.1007/978-3-030-69244-5_35
Zhuo RBai Z(2020)Key technologies of cloud computing-based IoT data miningInternational Journal of Computers and Applications10.1080/1206212X.2020.1738665(1-8)Online publication date: 18-Mar-2020
https://doi.org/10.1080/1206212X.2020.1738665
Hermenier FRamesh ANagpal AShukla HChandra R(2019)Hotspot Mitigations for the MassesProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362717(102-113)Online publication date: 20-Nov-2019
https://dl.acm.org/doi/10.1145/3357223.3362717
HoseinyFarahabady MJannesari ATari ZTaheri JZomaya A(2019)Dynamic Control of CPU Cap Allocations in Stream Processing and Data-Flow Platforms2019 IEEE 18th International Symposium on Network Computing and Applications (NCA)10.1109/NCA.2019.8935024(1-8)Online publication date: Sep-2019
https://doi.org/10.1109/NCA.2019.8935024
Choi JPark GNam D(2019)Interference-aware co-scheduling method based on classification of application characteristics from hardware performance counter using data miningCluster Computing10.1007/s10586-019-02949-7Online publication date: 12-Jun-2019
https://doi.org/10.1007/s10586-019-02949-7
HoseinyFarahabady MZomaya ATari Z(2018)A Model Predictive Controller for Managing QoS Enforcements and Microarchitecture-Level Interferences in a Lambda PlatformIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.277950229:7(1442-1455)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1109/TPDS.2017.2779502
Hoseinyfarahabady MFarhangsadr NZomaya ATari ZKhan S(2018)Elastic CPU Cap Mechanism for Timely Dataflow ApplicationsComputational Science – ICCS 201810.1007/978-3-319-93698-7_42(554-568)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1007/978-3-319-93698-7_42
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents