[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Modelling and developing conflict-aware scheduling on large-scale data centres

Published: 01 September 2018 Publication History

Abstract

Large-scale data centres are the growing trend for modern computing systems. Since a large-scale data centre has to manage a large number of machines and jobs, deploying multiple independent schedulers (termed as distributed schedulers in literature) to make scheduling decisions simultaneously has been shown as an effective way to speed up the processing of large quantity of submitted jobs and data. The key drawback of distributed schedulers is that since these schedulers schedule different jobs independently, the scheduling decisions made by different schedulers may conflict with each other due to the possibility that different scheduling decisions refer to the same subset of the resources in the data centre. Conflicting scheduling decisions cause additional scheduling attempts and consequently increase the scheduling cost. More resources each scheduler demands, higher scheduling cost may incur and longer job response times the users may experience. It is useful to investigate the balanced points in terms of resource demands for each of independent schedulers, so that the distributed schedulers can all achieve decent job performance without experiencing undesired resource competition. To address this issue, we model distributed scheduling and resource conflict using the game theory and conduct the quantitative analysis about scheduling cost and job performance. Further, based on the analysis, we develop the conflict-aware scheduling strategies to reduce the scheduling cost and improve job performance. We have conducted the simulation experiments with workload trace and also real experiments on Amazon Web Services(AWS). The experimental results verify the effectiveness of the proposed modelling approach and scheduling strategies.

Highlights

Propose a method to quantify the relation between the scheduling conflicts and the resource demands.
Develop a game-theoretical solution for the distributed schedulers in large scale data centres.
Design and conduct both simulation experiments and real experiments on Amazon Web Service.

References

[1]
Verma A., Pedrosa L., Korupolu M., Oppenheimer D., Tune E., Wilkes J., Large-scale cluster management at Google with Borg, in: Proceedings of the Tenth European Conference on Computer Systems, ACM, 2015, p. 18.
[2]
Xu Y., Li K., He L., Zhang L., Li K., A hybrid chemical reaction optimization scheme for task scheduling on heterogeneous computing systems, IEEE Trans. Parallel Distrib. Syst. (ISSN ) 26 (12) (2015) 3208–3222,.
[3]
Schwarzkopf M., Konwinski A., Abd-El-Malek M., Wilkes J., Omega: flexible, scalable schedulers for large compute clusters, in: Proceedings of the 8th ACM European Conference on Computer Systems, ACM, 2013, pp. 351–364.
[4]
Boutin E., Ekanayake J., Lin W., Shi B., Zhou J., Qian Z., Wu M., Zhou L., Apollo: scalable and coordinated scheduling for cloud-scale computing, in: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, 2014, pp. 285–300.
[5]
Hindman B., Konwinski A., Zaharia M., Ghodsi A., Joseph A.D., Katz R.H., Shenker S., Stoica I., Mesos: a platform for fine-grained resource sharing in the data center., in: NSDI, 11, 2011, 22–22.
[6]
He L., Zou D., Zhang Z., Chen C., Jin H., Jarvis S.A., Developing resource consolidation frameworks for moldable virtual machines in clouds, Future Gener. Comput. Syst. (ISSN ) 32 (2014) 69–81,. Special Section: The Management of Cloud Systems, Special Section: Cyber–Physical Society and Special Section: Special Issue on Exploiting Semantic Technologies with Particularization on Linked Data over Grid and Cloud Architectures http://www.sciencedirect.com/science/article/pii/S0167739X12001112.
[7]
Wen X., Shao L., Xue Y., Fang W., A rapid learning algorithm for vehicle classification, Inf. Sci. (ISSN ) 295 (2015) 395–406,. http://www.sciencedirect.com/science/article/pii/S0020025514010238.
[8]
Gu B., Sheng V.S., Wang Z., Ho D., Osman S., Li S., Incremental learning for v-support vector regression, Neural Netw. (ISSN ) 67 (C) (2015) 140–150,.
[9]
Gu B., Sheng V.S., Tay K.Y., Romano W., Li S., Incremental support vector learning for ordinal regression, IEEE Trans. Neural Netw. Learn. Syst. (ISSN ) 26 (7) (2015) 1403–1416,.
[10]
Zheng Y., Jeon B., Xu D., Wu Q.J., Zhang H., Image segmentation by generalized hierarchical Fuzzy C-means algorithm, J. Intell. Fuzzy Syst. (ISSN ) 28 (2) (2015) 961–973,.
[11]
Pan Z., Lei J., Zhang Y., Sun X., Kwong S., Fast motion estimation based on content property for low-complexity H.265/HEVC encoder, IEEE Trans. Broadcast (ISSN ) 62 (3) (2016) 675–684,.
[12]
Pan Z., Zhang Y., Kwong S., Efficient motion and disparity estimation optimization for low complexity multiview video coding, IEEE Trans. Broadcast (ISSN ) 61 (2) (2015) 166–176,.
[13]
Fu Z., Sun X., Liu Q., Zhou L., Shu J., Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing, IEICE Transactions on Communications E98.B (1) (2015) 190–200,.
[14]
Ma T., Zhou J., Tang M., Tian Y., Al-Dhelaan A., Al-Rodhaan M., Lee S., Social network and tag sources based augmenting collaborative recommender system, IEICE Trans. Inf. & Syst. (ISSN ) 98 (4) (2015) 902–910,. http://ci.nii.ac.jp/naid/130005061850/en/.
[15]
Fu S., He L., Liao X., Huang C., Developing the cloud-integrated data replication framework in decentralized online social networks, J. Comput. Syst. Sci. (ISSN ) 82 (1, Part B) (2016) 113–129,. Mobile Social Networking and computing in Proximity (MSNP): A Multi-disciplinary Inspired Approach http://www.sciencedirect.com/science/article/pii/S0022000015000720.
[16]
Zhang Z., Li C., Tao Y., Yang R., Tang H., Xu J., Fuxi: a fault-tolerant resource management and job scheduling system at internet scale, Proceedings of the VLDB Endowment 7 (13) (2014) 1393–1404.
[17]
Reiss C., Tumanov A., Ganger G.R., Katz R.H., Kozuch M.A., Towards understanding heterogeneous clouds at scale: Google trace analysis, Intel Science and Technology Center for Cloud Computing, Tech. Rep. (2012) 84.
[18]
Li K., Mei J., Li K., A fund-constrained investment scheme for profit maximization in cloud computing, IEEE Transactions on Services Computing (ISSN ) PP (99) (2016),. 1–1.
[19]
Zaharia M., Chowdhury M., Franklin M.J., Shenker S., Stoica I., Spark: cluster computing with working sets, in: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010, 10–10.
[20]
J. Wilkes. (2011) More Google cluster data. [Online]. Available: https://goo.gl/OnnkLu.
[21]
Ferguson A.D., Bodik P., Kandula S., Boutin E., Fonseca R., Jockey: guaranteed job latency in data parallel clusters, in: Proceedings of the 7th ACM European Conference on Computer Systems, ACM, 2012, pp. 99–112.
[22]
Agarwal S., Kandula S., Bruno N., Wu M.-C., Stoica I., Zhou J., Reoptimizing data parallel computing, in: Presented As Part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), 2012, pp. 281–294.
[23]
Isard M., Prabhakaran V., Currey J., Wieder U., Talwar K., Goldberg A., Quincy: fair scheduling for distributed computing clusters, in: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, ACM, 2009, pp. 261–276.
[24]
Ananthanarayanan G., Ghodsi A., Wang A., Borthakur D., Kandula S., Shenker S., Stoica I., PACMan: coordinated memory caching for parallel jobs, in: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, 2012, 20–20.
[25]
Reiss C., Tumanov A., Ganger G.R., Katz R.H., Kozuch M.A., Heterogeneity and dynamicity of clouds at scale: Google trace analysis, in: Proceedings of the Third ACM Symposium on Cloud Computing, ACM, 2012, p. 7.
[26]
Delimitrou C., Kozyrakis C., Quasar: resource-efficient and qos-aware cluster management, in: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2014, pp. 127–144.
[27]
Ananthanarayanan G., Hung M.C.-C., Ren X., Stoica I., Wierman A., Yu M., GRASS: trimming stragglers in approximation analytics, in: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, 2014, pp. 289–302.
[28]
Agarwal S., Mozafari B., Panda A., Milner H., Madden S., Stoica I., Blinkdb: queries with bounded errors and bounded response times on very large data, in: Proceedings of the 8th ACM European Conference on Computer Systems, ACM, 2013, pp. 29–42.
[29]
Sfrent A., Pop F., Asymptotic scheduling for many task computing in big data platforms, Inf. Sci. (ISSN ) 319 (2015) 71–91,. Energy Efficient Data, Services and Memory Management in Big Data Information Systems http://www.sciencedirect.com/science/article/pii/S0020025515002182.
[30]
Bessis N., Sotiriadis S., Xhafa F., Pop F., Cristea V., Meta-scheduling issues in interoperable HPCs, grids and clouds, Int. J. Web Grid Serv. (ISSN ) 8 (2) (2012) 153–172,.
[31]
Vasile M.-A., Pop F., Tutueanu R.-I., Cristea V., Koodziej J., Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing, Future Gener. Comput. Syst. (ISSN ) 51 (2015) 61–71,. Special Section: A Note on New Trends in Data-Aware Scheduling and Resource Provisioning in Modern {HPC} Systems http://www.sciencedirect.com/science/article/pii/S0167739X14002532.
[32]
S. T. Center. (2015) Spark benchmark suite. [Online]. Available: https://goo.gl/ltoQT3.
[33]
DATASTAX. (2015) Apache Cassandra Stress Tool. [Online]. Available: https://goo.gl/2NPTPS.
[34]
Nash J., Non-cooperative games, Ann. Math. (1951) 286–295.
[35]
M. A. M. McKelvey, Richard D. and T. L. Turocy. (2014) Gambit: Software tools for game theory, version 14.1.0. [Online]. Available http://goo.gl/QobQAd.
[36]
Daskalakis C., Goldberg P.W., Papadimitriou C.H., The complexity of computing a Nash equilibrium, SIAM J. Comput. 39 (1) (2009) 195–259.
[37]
He L., Zhu H., Jarvis S.A., Developing graph-based co-scheduling algorithms on multicore computers, IEEE Trans. Parallel Distrib. Syst. (ISSN ) 27 (6) (2016) 1617–1632,.
[38]
Delgado P., Dinu F., Kermarrec A.-M., Zwaenepoel W., Hawk: hybrid datacenter scheduling, in: 2015 USENIX Annual Technical Conference (USENIX ATC 15), 2015, pp. 499–510.
[39]
Karanasos K., Rao S., Curino C., Douglas C., Chaliparambil K., Fumarola G.M., Heddaya S., Ramakrishnan R., Sakalanaga S., Mercury: hybrid centralized and distributed scheduling in large shared clusters, in: 2015 USENIX Annual Technical Conference (USENIX ATC 15), 2015, pp. 485–497.
[40]
Liu C., Li K., Xu C., Li K., Strategy configurations of multiple users competition for cloud service reservation, IEEE Trans. Parallel Distrib. Syst. (ISSN ) 27 (2) (2016) 508–520,.
[41]
Ghosh P., Roy N., Das S.K., Basu K., A game theory based pricing strategy for job allocation in mobile grids, in: Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, IEEE, 2004, p. 82.
[42]
Bredin J., Maheswaran R.T., Imer C., Başar T., Kotz D., Rus D., A game-theoretic formulation of multi-agent resource allocation, in: Proceedings of the Fourth International Conference on Autonomous Agents, ACM, 2000, pp. 349–356.
[43]
Kwok Y.-K., Hwang K., Song S., Selfish grids: game-theoretic modeling and nas/psa benchmark evaluation, Parallel and Distributed Systems, IEEE Transactions on 18 (5) (2007) 621–636.
[44]
Penmatsa S., Chronopoulos A.T., Cooperative load balancing for a network of heterogeneous computers, in: Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, IEEE, 2006, pp. 8–pp.
[45]
Li K., Liu C., Li K., Zomaya A.Y., A framework of price bidding configurations for resource usage in cloud computing, IEEE Trans. Parallel Distrib. Syst. (ISSN ) 27 (8) (2016) 2168–2181,.
[46]
Grosu D., Chronopoulos A.T., Noncooperative load balancing in distributed systems, J. Parallel Distrib. Comput. 65 (9) (2005) 1022–1034.
[47]
Li K., Tang X., Veeravalli B., Li K., Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems, IEEE Trans. Comput. (ISSN ) 64 (1) (2015) 191–204,.
[48]
Delimitou C., Kozyrakis C., Paragon: QoS-aware scheduling for heterogeneous datacenters, ACM SIGARCH Computer Architecture News 41 (1) (2013) 77–88.
[49]
Mei J., Li K., Ouyang A., Li K., A profit maximization scheme with guaranteed quality of service in cloud computing, IEEE Trans. Comput. (ISSN ) 64 (11) (2015) 3064–3078,.
[50]
Mei J., Li K., Li K., Customer-satisfaction-aware optimal multiserver configuration for profit maximization in cloud computing, IEEE Transactions on Sustainable Computing (ISSN ) 2 (1) (2017) 17–29,.
[51]
Zhang X., Dwarkadas S., Folkmanis G., Shen K., Processor hardware counter statistics as a first-class system resource., in: HotOS, 2007.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems
Future Generation Computer Systems  Volume 86, Issue C
Sep 2018
1535 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 September 2018

Author Tags

  1. Data centre
  2. Scheduling
  3. Game theory
  4. Resource conflict

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media