[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2806777.2806780acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Scheduling jobs across geo-distributed datacenters

Published: 27 August 2015 Publication History

Abstract

With growing data volumes generated and stored across geo-distributed datacenters, it is becoming increasingly inefficient to aggregate all data required for computation at a single datacenter. Instead, a recent trend is to distribute computation to take advantage of data locality, thus reducing the resource (e.g., bandwidth) costs while improving performance. In this trend, new challenges are emerging in job scheduling, which requires coordination among the datacenters as each job runs across geo-distributed sites. In this paper, we propose novel job scheduling algorithms that coordinate job scheduling across datacenters with low overhead, while achieving near-optimal performance. Our extensive simulation study with realistic job traces shows that the proposed scheduling algorithms result in up to 50% improvement in average job completion time over the Shortest Remaining Processing Time (SRPT) based approaches.

References

[1]
http://aws.amazon.com/about-aws/global-infrastructure/. Amazonn Global Infrastructure.
[2]
https://code.google.com/p/googleclusterdata/. Google Cluster Workload Traces.
[3]
http://hadoop.apache.org/. Hadoop Cluster Computing System.
[4]
http://www.microsoft.com/en-us/server-cloud/cloud-os/global-datacenters.aspx. Microsoft Cloud Platform.
[5]
G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in mapreduce clusters using mantri. In USENIX OSDI, 2010.
[6]
G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. Pacman: coordinated memory caching for parallel jobs. In USENIX NSDI, 2012.
[7]
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective straggler mitigation: Attack of the clones. In USENIX NSDI, 2013.
[8]
G. Ananthanarayanan, C.-C. Hung, X. Ren, I. Stoica, A. Wierman, and M. Yu. Grass: trimming stragglers in approximation analytics. In USENIX NSDI, 2014.
[9]
N. Bansal and M. Harchol-Balter. Analysis of SRPT scheduling: Investigating unfairness. ACM, 2001.
[10]
M. Chowdhury, Y. Zhong, and I. Stoica. Efficient coflow scheduling with varys. In ACM SIGCOMM, 2014.
[11]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, et al. Spanner: Googles globally distributed database. ACM Transactions on Computer Systems (TOCS), 2013.
[12]
N. Garg, A. Kumar, and V. Pandit. Order scheduling models: hardness and algorithms. In FSTTCS 2007: Foundations of Software Technology and Theoretical Computer Science. Springer, 2007.
[13]
A. Gupta, F. Yang, J. Govig, A. Kirsch, K. Chan, K. Lai, S. Wu, S. G. Dhoot, A. R. Kumar, A. Agiwal, et al. Mesa: Geo-replicated, near real-time, scalable data warehousing. In Proceedings of the VLDB Endowment, 2014.
[14]
M. Hajjat, D. Maltz, S. Rao, K. Sripanidkulchai, et al. Dealer: application-aware request splitting for interactive cloud applications. In ACM CoNEXT, 2012.
[15]
M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal. Size-based scheduling to improve web performance. ACM Transactions on Computer Systems (TOCS), 2003.
[16]
J.-H. Hwang, U. Cetintemel, and S. Zdonik. Fast and reliable stream processing over wide area networks. In IEEE ICDE Workshop, 2007.
[17]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In ACM SIGOPS Operating Systems Review, 2007.
[18]
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In ACM SOSP, 2009.
[19]
S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, et al. B4: Experience with a globally-deployed software defined wan. In ACM SIGCOMM, 2013.
[20]
T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete. Mdcc: Multi-data center consistency. In ACM EuroSys, 2013.
[21]
M. Lin, A. Wierman, and B. Zwart. The average response time in a heavy-traffic srpt queue. ACM SIGMETRICS Performance Evaluation Review, 2010.
[22]
M. Lin, A. Wierman, and B. Zwart. Heavy-traffic analysis of mean response time under shortest remaining processing time. Performance Evaluation, 2011.
[23]
M. Mastrolilli, M. Queyranne, A. S. Schulz, O. Svensson, and N. A. Uhan. Minimizing the sum of weighted completion times in a concurrent open shop. Operation Research Letter, 2010.
[24]
S. Muralidhar, W. Lloyd, S. Roy, C. Hill, E. Lin, W. Liu, S. Pan, S. Shankar, V. Sivakumar, L. Tang, et al. f4: Facebook warm blob storage system. In USENIX OSDI, 2014.
[25]
P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, M. Welsh, and M. Seltzer. Network-aware operator placement for stream-processing systems. In IEEE ICDE, 2006.
[26]
Q. Pu, G. Ananthanarayanan, P. Bodik, S. Kandula, A. Akella, P. Bahl, and I. Stoica. Low latency geo-distributed data analytics. In ACM SIGCOMM, 2015.
[27]
A. Rabkin, M. Arye, S. Sen, V. S. Pai, and M. J. Freedman. Aggregation and degradation in jetstream: Streaming analytics in the wide area. In USENIX NSDI, 2014.
[28]
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing, 2012.
[29]
T. A. Roemer. A note on the complexity of the concurrent open shop problem. Springer, 2006.
[30]
L. Schrage. A proof of the optimality of the shortest remaining processing time discipline. Operations Research, 1968.
[31]
L. E. Schrage and L. W. Miller. The queue m/g/1 with the shortest remaining processing time discipline. Operations Research, 1966.
[32]
B. Schroeder and M. Harchol-Balter. Web servers under overload: How scheduling can help. ACM Transactions on Internet Technology (TOIT), 2006.
[33]
B. Sharma, V. Chudnovsky, J. L. Hellerstein, R. Rifaat, and C. R. Das. Modeling and synthesizing task placement constraints in google compute clusters. In Proceedings of the 2nd ACM Symposium on Cloud Computing, 2011.
[34]
J. Tan, X. Meng, and L. Zhang. Delay tails in mapreduce scheduling. ACM SIGMETRICS Performance Evaluation Review, 2012.
[35]
A. Vulimiri, C. Curino, B. Godfrey, T. Jungblut, J. Padhye, and G. Varghese. Global analytics in the face of bandwidth and regulatory constraints. In To Appear in NSDI, 2015.
[36]
A. Vulimiri, C. Curino, B. Godfrey, K. Karanasos, and G. Varghese. Wanalytics: Analytics for a geo-distributed data-intensive world. In To Appear in CIDR, 2015.
[37]
A. Wierman and M. Harchol-Balter. Classifying scheduling policies with respect to unfairness in an m/gi/1. In ACM SIGMETRICS Performance Evaluation Review, 2003.
[38]
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In ACM EuroSys, 2010.
[39]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX NSDI, 2012.

Cited By

View all
  • (2024)A Communication-Contention-Aware Privacy-Preserving Workflow Scheduling Method for Geo-Distributed DatacentersIEEE Transactions on Services Computing10.1109/TSC.2024.340759517:5(1887-1898)Online publication date: Sep-2024
  • (2024)Adaptive QoS-Aware Microservice Deployment With Excessive Loads via Intra- and Inter-Datacenter SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342593135:9(1565-1582)Online publication date: Sep-2024
  • (2024)Minimizing Buffer Utilization for Lossless Inter-DC LinksIEEE/ACM Transactions on Networking10.1109/TNET.2024.344360032:6(4960-4975)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing
August 2015
446 pages
ISBN:9781450336512
DOI:10.1145/2806777
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • NSF

Conference

SoCC '15
Sponsor:
SoCC '15: ACM Symposium on Cloud Computing
August 27 - 29, 2015
Hawaii, Kohala Coast

Acceptance Rates

SoCC '15 Paper Acceptance Rate 34 of 157 submissions, 22%;
Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)11
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Communication-Contention-Aware Privacy-Preserving Workflow Scheduling Method for Geo-Distributed DatacentersIEEE Transactions on Services Computing10.1109/TSC.2024.340759517:5(1887-1898)Online publication date: Sep-2024
  • (2024)Adaptive QoS-Aware Microservice Deployment With Excessive Loads via Intra- and Inter-Datacenter SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342593135:9(1565-1582)Online publication date: Sep-2024
  • (2024)Minimizing Buffer Utilization for Lossless Inter-DC LinksIEEE/ACM Transactions on Networking10.1109/TNET.2024.344360032:6(4960-4975)Online publication date: Dec-2024
  • (2024)A Survey on Scheduling Techniques in Computing and Network ConvergenceIEEE Communications Surveys & Tutorials10.1109/COMST.2023.332902726:1(160-195)Online publication date: Sep-2025
  • (2023)PlexusProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624643(1-16)Online publication date: 30-Oct-2023
  • (2023)DAG-Aware Optimization for Geo-Distributed Data AnalyticsProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605575(472-481)Online publication date: 7-Aug-2023
  • (2023)Drone-Hosted Computation for Emergency ResponseIEEE Internet of Things Journal10.1109/JIOT.2023.328404510:23(20408-20414)Online publication date: 1-Dec-2023
  • (2023)Bifrost: Extending RoCE for Long Distance Inter-DC Links2023 IEEE 31st International Conference on Network Protocols (ICNP)10.1109/ICNP59255.2023.10355634(1-12)Online publication date: 10-Oct-2023
  • (2023)ExplSched: Maximizing Deep Learning Cluster Efficiency for Exploratory Jobs2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00022(173-184)Online publication date: 31-Oct-2023
  • (2022)Performance Evaluation of the Weighted Least Connection Scheduling for Datacenters with BigHouse Simulator2022 IEEE International Conference on Electro Information Technology (eIT)10.1109/eIT53891.2022.9813846(001-004)Online publication date: 19-May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media