[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Virtual cluster optimisation for MapReduce-like applications

Published: 25 May 2019 Publication History

Abstract

Infrastructure-as-a-service clouds are becoming ubiquitous for provisioning virtual machines on demand. Cloud service providers expect to use the least resources to deliver the best services. As users frequently request virtual machines to build virtual clusters and run MapReduce-like jobs for big data processing, cloud service providers intend to optimise the virtual cluster to minimise network latency and subsequently reduce data movement cost. In this paper, we focus on the virtual machine placement issue for provisioning virtual clusters with minimum network latency in clouds. We define the distance as the latency between virtual machines and use it to measure the affinity of a virtual cluster. Such metric of distance indicates the considerations of virtual machine placement and the topology of physical nodes in clouds. Then, we formulate our problem as the classical shortest distance problem and solve it by building an integer programming model. A greedy virtual machine placement algorithm is designed to get a compact virtual cluster. Furthermore, an improved heuristic algorithm is also presented for achieving a global resource optimisation. The simulation results verify our algorithms and the experiment results validate the improvement achieved by our approaches.

References

[1]
Ahmad, R.W., Gani, A., Hamid, S.H. et al. (2015) 'A survey on virtual machine migration and server consolidation frameworks for cloud data centers', Journal of Network and Computer Applications, Vol. 52, No. 2, pp. 11-25.
[2]
Chen, F., Kodialam, M. and Lakshman, T.V. (2012) 'Joint scheduling of processing and shuffle phases in MapReduce systems', Proceedings of the 31st Annual International Conference on Computer Communications (INFOCOM), pp. 1143-1151.
[3]
Chen, Y., Wang, S., Chang, H. et al. (2016) 'The performance analysis for virtualisation cluster and cloud platforms', International Journal of Computational Science and Engineering, Vol. 6, No. 4, pp. 255-263.
[4]
Chen, Y., Wo, T. and Li, J. (2009) 'An efficient resource management system for on-line virtual cluster provision', Proceedings of the 2nd IEEE International Conference on Cloud Computing (CLOUD), pp. 72-79.
[5]
Chowdhury, M., Zaharia, M., Ma, J. et al. (2011) 'Managing data transfers in computer clusters with orchestra', Proceedings of ACM SIGCOMM, pp. 98-109.
[6]
Costa, P., Donnelly, A., Rowstron, A. et al. (2012) 'Camdoop: exploiting in-network aggregation for big data applications', Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 1-14.
[7]
Dean, J. and Ghemawat, S. (2008) 'MapReduce: simplified data processing on large clusters', Communications of the ACM, Vol. 51, No. 1, pp. 107-113.
[8]
Hung, C. and Lin, C. (2016) 'Efficient parallelised search engine based on virtual cluster', International Journal of Computational Science and Engineering, Vol. 12, No. 1, pp. 53-57.
[9]
Ibrahim, S., Jin, H., Lu, L. et al. (2009) 'Evaluating MapReduce on virtual machines: the hadoop case', Proceedings of the 2nd IEEE International Conference on Cloud Computing (CLOUD), pp. 519-528.
[10]
Isard, M., Budiu, M., Yu, Y. et al. (2007) 'Dryad: distributed data-parallel programs from sequential building blocks', Proceedings of the 2nd European Conference on Computer Systems (EuroSys), pp. 59-72.
[11]
Juve, G. and Deelman, E. (2011) 'Wrangler: virtual cluster provisioning for the cloud', Proceedings of the 20th International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC), pp. 277-278.
[12]
Lee, M., Lin, J. and Yahyapour, R. (2016) 'Hybrid job-driven scheduling for virtual MapReduce clusters', IEEE Transactions on Parallel & Distributed Systems, Vol. 27, No. 6, pp. 1687-1699.
[13]
Li, Z., Bai, Y., Zhang, H. et al. (2010) 'Affinity-aware dynamic pinning scheduling for virtual machines', Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 242-249.
[14]
Liu, H. and Orban, D. (2011) 'Cloud MapReduce: a MapReduce implementation on top of a cloud operating system', Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 464-474.
[15]
Maguluri, S., Srikant, R. and Ying, L. (2012) 'Stochastic models of load balancing and scheduling in cloud computing clusters', Proceedings of the 31st Annual International Conference on Computer Communications (INFOCOM), pp. 702-710.
[16]
Mei, J., Li, K., Ouyang, A. et al. (2015) 'A profit maximization scheme with guaranteed quality of service in cloud computing', IEEE Transactions on Computers, Vol. 64, No. 11, pp. 3064-3078.
[17]
Meng, X., Pappas, V. and Zhang, L. (2010) 'Improving the scalability of data center networks with traffic-aware virtual machine placement', Proceedings of the 29th Annual International Conference on Computer Communications (INFOCOM), pp. 1-9.
[18]
Park, J., Lee, D., Kim, B. et al. (2012) 'Locality-aware dynamic VM reconfiguration on MapReduce clouds', Proceedings of the 21st International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC), pp. 27-36.
[19]
She, Q., Li, Q. and Deng, J. (2016) 'The study of multiinstance purchase decision-making for minimizing customers cost under fluctuating cloud demands', International Journal of High-Performance Computing and Networking, Vol. 9, No. 1, pp. 127-133.
[20]
Singh, A., Korupolu, M. and Mohapatra, D. (2008) 'Server-storage virtualization: integration and load balancing in data centers', Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC).
[21]
Tan, J., Meng, X. and Zhang, L. (2012) 'Performance analysis of coupling scheduler for MapReduce/hadoop', Proceedings of the 31st Annual International Conference on Computer Communications (INFOCOM), pp. 2868-2872.
[22]
White, T. (2012) Hadoop: The Definitive Guide, pp. 29-39, O'Reilly Media, Inc., New York, NY.
[23]
Zaharia, M., Borthakur, D., Sarma, J. et al. (2010) 'Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling', Proceedings of the 5th European Conference on Computer Systems (EuroSys), pp. 265-278.
[24]
Zaharia, M., Borthakur, D., Sarma, J.S. et al. (2009) Job Scheduling for Multi-user MapReduce Clusters, Tech. Rep. UCB/EECS-2009-55, EECS Department, University of California, Berkeley.
[25]
Zhang, J., Zhou, H., Chen, R. et al. (2012) 'Optimizing data shuffling in data-parallel computation by understanding user-defined functions', Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI).
[26]
Zheng, Q., Li, J., Dong, B. et al. (2015) 'Multi-objective optimization algorithm based on BBO for virtual machine consolidation problem', Proceedings of the 21st International Conference on Parallel and Distributed Systems (ICPADS), pp. 414-421.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing and Networking
International Journal of High Performance Computing and Networking  Volume 13, Issue 4
January 2019
121 pages
ISSN:1740-0562
EISSN:1740-0570
Issue’s Table of Contents

Publisher

Inderscience Publishers

Geneva 15, Switzerland

Publication History

Published: 25 May 2019

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media