[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters

Published: 14 March 2015 Publication History

Abstract

Virtualization technologies has been widely adopted by large-scale cloud computing platforms. These virtualized systems employ distributed resource management (DRM) to achieve high resource utilization and energy savings by dynamically migrating and consolidating virtual machines. DRM schemes usually use operating-system-level metrics, such as CPU utilization, memory capacity demand and I/O utilization, to detect and balance resource contention. However, they are oblivious to microarchitecture-level resource interference (e.g., memory bandwidth contention between different VMs running on a host), which is currently not exposed to the operating system.
We observe that the lack of visibility into microarchitecture-level resource interference significantly impacts the performance of virtualized systems. Motivated by this observation, we propose a novel architecture-aware DRM scheme (ADRM), that takes into account microarchitecture-level resource interference when making migration decisions in a virtualized cluster. ADRM makes use of three core techniques: 1) a profiler to monitor the microarchitecture-level resource usage behavior online for each physical host, 2) a memory bandwidth interference model to assess the interference degree among virtual machines on a host, and 3) a cost-benefit analysis to determine a candidate virtual machine and a host for migration.
Real system experiments on thirty randomly selected combinations of applications from the CPU2006, PARSEC, STREAM, NAS Parallel Benchmark suites in a four-host virtualized cluster show that ADRM can improve performance by up to 26.55%, with an average of 9.67%, compared to traditional DRM schemes that lack visibility into microarchitecture-level resource utilization and contention.

References

[1]
Windows Azure. http://www.windowsazure.com/en-un/.
[2]
Amazon EC2. http://aws.amazon.com/ec2/.
[3]
libvirt: The virtualization API. http://libvirt.org.
[4]
NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html.
[5]
QEMU. http://qemu.org.
[6]
SPEC CPU2006. http://www.spec.org/spec2006.
[7]
STREAM Benchmark. http://www.streambench.org/.
[8]
J. Ahn, C. Kim, J. Han, Y.-R. Choi, and J. Huh. Dynamic virtual machine scheduling in clouds for architectural shared resources. In HotCloud, 2012.
[9]
M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the problems and opportunities posed by multiple on-chip memory controllers. In PACT, 2010.
[10]
N. Beckmann, P.-A. Tsai, and D. Sanchez. Scaling dis- tributed cache hierarchies through computation and data co- scheduling. In HPCA, 2015.
[11]
C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.
[12]
S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova. A case for NUMA-aware contention management on multicore systems. In USENIX ATC, 2011.
[13]
K. K. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu. HAT: heterogeneous adaptive throttling for on-chip networks. In SBAC-PAD, 2012.
[14]
S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006.
[15]
C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In NSDI, 2005.
[16]
R. Das, O. Mutlu, T. Moscibroda, and C. Das. Application- aware prioritization mechanisms for on-chip networks. In MICRO, 2009.
[17]
R. Das, O. Mutlu, T. Moscibroda, and C. R. Das. Aérgia: exploiting packet latency slack in on-chip networks. In ISCA, 2010.
[18]
R. Das, R. Ausavarungnirun, O. Mutlu, A. Kumar, and M. Azimi. Application-to-core mapping policies to reduce memory system interference in multi-core systems. In HPCA, 2013.
[19]
M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on NUMA systems. In ASPLOS, 2013.
[20]
E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS, 2010.
[21]
D. Eklov, N. Nikoleris, D. Black-Schaffer, and E. Hagersten. Bandwidth Bandit: Quantitative characterization of memory contention. In PACT, 2012.
[22]
S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, (3), 2008.
[23]
D. Gmach, J. Rolia, L. Cherkasova, G. Belrose, T. Turicchi, and A. Kemper. An integrated approach to resource pool management: Policies, efficiency and quality metrics. In DSN, 2008.
[24]
S. Govindan, J. Liu, A. Kansal, and A. Sivasubramaniam. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In SoCC, 2011.
[25]
B. Grot, S. W. Keckler, and O. Mutlu. Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip. In MICRO, 2009.
[26]
B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. In ISCA, 2011.
[27]
A. Gulati, I. Ahmad, and C. A. Waldspurger. PARDA: Proportional allocation of resources for distributed storage access. In FAST, 2009.
[28]
A. Gulati, C. Kumar, I. Ahmad, and K. Kumar. BASIL: Automated IO load balancing across storage devices. In FAST, 2010.
[29]
A. Gulati, A. Merchant, and P. J. Varman. mClock: Handling throughput variability for hypervisor IO scheduling. In OSDI, 2010.
[30]
A. Gulati, G. Shanmuganathan, I. Ahmad, C. Waldspurger, and M. Uysal. Pesto: Online storage performance management in virtualized datacenters. In SoCC, 2011.
[31]
A. Gulati, A. Holler, M. Ji, G. Shanmuganathan, C. Waldspurger, and X. Zhu. VMware distributed resource management: Design, implementation, and lessons learned. VMware Technical Journal, 1(1):45--64, 2012.
[32]
Intel. Performance Analysis Guide for Intel Core i7 Processor and Intel Xeon 5500 processors.
[33]
Intel. An Introduction to the Intel QuickPath Interconnect, 2009.
[34]
C. Isci, J. Hanson, I. Whalley, M. Steinder, and J. Kephart. Runtime demand estimation for effective dynamic resource management. In NOMS, 2010.
[35]
C. Isci, J. Liu, B. Abali, J. Kephart, and J. Kouloheris. Improving server utilization using fast virtual machine migration. IBM Journal of Research and Development, 55 (6), Nov 2011.
[36]
R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004.
[37]
M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. Measuring interference between live datacenter applications. In SC, 2012.
[38]
O. Kayiran, N. C. Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das. Managing GPU concurrency in heterogeneous architectures. In MICRO, 2014.
[39]
H. Kim, D. de Niz, B. Andersson, M. H. Klein, O. Mutlu, and R. Rajkumar. Bounding memory interference delay in cots-based multi-core systems. In RTAS, 2014.
[40]
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.
[41]
Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.
[42]
Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010.
[43]
A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori. kvm: the Linux Virtual Machine Monitor. In Proceedings of the Linux Symposium, volume 1, 2007.
[44]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, 2008.
[45]
M. Liu and T. Li. Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads. In ISCA, 2014.
[46]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In MICRO, 2011.
[47]
T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007.
[48]
T. Moscibroda and O. Mutlu. Distributed order scheduling and its application to multi-core DRAM controllers. In PODC, 2008.
[49]
S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In MICRO, 2011.
[50]
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007.
[51]
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA, 2008.
[52]
R. Nathuji, A. Kansal, and A. Ghaffarkhah. Q-clouds: Managing performance interference effects for QoS-aware clouds. In EuroSys, 2010.
[53]
M. Nelson, B.-H. Lim, and G. Hutchins. Fast transparent migration for virtual machines. In USENIX ATC, 2005.
[54]
G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. Next generation on-chip networks: What kind of congestion control do we need? In HotNets, 2010.
[55]
G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. On-chip networks from a networking perspective: Congestion and scalability in many-core interconnects. In SIGCOMM, 2012.
[56]
P. Padala, K.-Y. Hou, K. G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, and A. Merchant. Automated control of multiple virtualized resources. In EuroSys, 2009.
[57]
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006.
[58]
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA, 2007.
[59]
J. Rao and X. Zhou. Towards fair and efficient SMP virtual machine scheduling. In PPoPP, 2014.
[60]
J. Rao, K. Wang, X. Zhou, and C.-Z. Xu. Optimizing virtual machine scheduling in NUMA multicore systems. In HPCA, 2013.
[61]
V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In PACT, 2012.
[62]
A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000.
[63]
L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu. MISE: Providing performance predictability and improving fairness in shared main memory systems. In HPCA, 2013.
[64]
L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, and O. Mutlu. The blacklisting memory scheduler: Achieving high performance and fairness at low cost. In ICCD, 2014.
[65]
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. Journal of Supercomputing, 28(1), 2004.
[66]
L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, 2011.
[67]
L. Tang, J. Mars, and M. L. Soffa. Compiling for niceness: Mitigating contention for QoS in warehouse scale computers. In CGO, 2012.
[68]
A. Tumanov, J. Wise, O. Mutlu, and G. R. Ganger. Asymmetry-aware execution placement on manycore chips. In SFMA, 2013.
[69]
H. Vandierendonck and A. Seznec. Fairness metrics for multi-threaded processors. IEEE CAL, February 2011.
[70]
C. A. Waldspurger. Memory resource management in VMware ESX server. In OSDI, 2002.
[71]
C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic adaptive scheduling for virtual machines. In HPDC, 2011.
[72]
T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Black- box and gray-box strategies for virtual machine migration. In NSDI, 2007.
[73]
Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo- partitioning of multi-core shared caches. In ISCA, 2009.
[74]
H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In ISCA, 2013.
[75]
K. Ye, Z. Wu, C. Wang, B. Zhou, W. Si, X. Jiang, and A. Zomaya. Profiling-based workload consolidation and migration in virtualized data centres. TPDS, 2014.
[76]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010

Cited By

View all
  • (2024)Geo-Distributed Analytical Streaming Architecture for IoT Platforms2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00030(263-274)Online publication date: 24-Sep-2024
  • (2022)Low Latency Execution Guarantee Under Uncertainty in Serverless PlatformsParallel and Distributed Computing, Applications and Technologies10.1007/978-3-030-96772-7_30(324-335)Online publication date: 16-Mar-2022
  • (2021)QSpark: Distributed Execution of Batch & Streaming Analytics in Spark Platform2021 IEEE 20th International Symposium on Network Computing and Applications (NCA)10.1109/NCA53618.2021.9685833(1-8)Online publication date: 23-Nov-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 50, Issue 7
VEE '15
July 2015
221 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2817817
  • Editor:
  • Andy Gill
Issue’s Table of Contents
  • cover image ACM Conferences
    VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
    March 2015
    238 pages
    ISBN:9781450334501
    DOI:10.1145/2731186
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2015
Published in SIGPLAN Volume 50, Issue 7

Check for updates

Author Tags

  1. live migration
  2. microarchitecture
  3. performance counters
  4. resource management
  5. virtualization

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Geo-Distributed Analytical Streaming Architecture for IoT Platforms2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00030(263-274)Online publication date: 24-Sep-2024
  • (2022)Low Latency Execution Guarantee Under Uncertainty in Serverless PlatformsParallel and Distributed Computing, Applications and Technologies10.1007/978-3-030-96772-7_30(324-335)Online publication date: 16-Mar-2022
  • (2021)QSpark: Distributed Execution of Batch & Streaming Analytics in Spark Platform2021 IEEE 20th International Symposium on Network Computing and Applications (NCA)10.1109/NCA53618.2021.9685833(1-8)Online publication date: 23-Nov-2021
  • (2021)Graceful Performance Degradation in Apache StormParallel and Distributed Computing, Applications and Technologies10.1007/978-3-030-69244-5_35(389-400)Online publication date: 21-Feb-2021
  • (2020)Key technologies of cloud computing-based IoT data miningInternational Journal of Computers and Applications10.1080/1206212X.2020.1738665(1-8)Online publication date: 18-Mar-2020
  • (2019)Hotspot Mitigations for the MassesProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362717(102-113)Online publication date: 20-Nov-2019
  • (2019)Dynamic Control of CPU Cap Allocations in Stream Processing and Data-Flow Platforms2019 IEEE 18th International Symposium on Network Computing and Applications (NCA)10.1109/NCA.2019.8935024(1-8)Online publication date: Sep-2019
  • (2019)Interference-aware co-scheduling method based on classification of application characteristics from hardware performance counter using data miningCluster Computing10.1007/s10586-019-02949-7Online publication date: 12-Jun-2019
  • (2018)A Model Predictive Controller for Managing QoS Enforcements and Microarchitecture-Level Interferences in a Lambda PlatformIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.277950229:7(1442-1455)Online publication date: 1-Jul-2018
  • (2018)Elastic CPU Cap Mechanism for Timely Dataflow ApplicationsComputational Science – ICCS 201810.1007/978-3-319-93698-7_42(554-568)Online publication date: 11-Jun-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media