[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/INFOCOM.2018.8486340guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Scheduling Cofiows of Multi-stage Jobs to Minimize the Total Weighted Job Completion Time

Published: 16 April 2018 Publication History

Abstract

Datacenter networks are critical to Cloud computing. The coflow abstraction is a major leap forward of application-aware network scheduling. In the context of multistage jobs, there are dependencies among coflows. As a result, there is a large divergence between coflow-completion-time (CCT) and job-completion-time (JCT). To our best knowledge, this is the first work that systematically studies: how to schedule dependent coflows of multi-stage jobs, so that the total weighted job completion time can be minimized. We present a formal mathematical formulation. We also prove that this problem is strongly NP-hard. Inspired by the optimal solution of the relaxed linear programming, we design an algorithm that runs in polynomial time to solve this problem with an approximation ratio of (2M &#x002B; 1), where <tex>$M$</tex> is the number of machines. Evaluation results demonstrate that, the largest gap between our algorithm and the lower bound is only 9.14%. We reduce the average JCT by up to 33.48 % compared with Aalo, a heuristic multi-stage coflow scheduler. We reduce the total weighted JCT by up to 83.31 % compared with LP-OV-LS, the state-of-the-art approximation algorithm of coflow scheduling.

References

[1]
J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters”, Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
[3]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing”, in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012, pp. 2–2.
[4]
M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, “Managing data transfers in computer clusters with orchestra”, in ACM SIGCOMM, vol. 41, no. 4. ACM, 2011, pp. 98–109.
[5]
M. Chowdhury and I. Stoica, “Coflow: a networking abstraction for cluster applications”, in ACM Hotnets. ACM, 2012, pp. 31–36.
[6]
M. Chowdhury, Y. Zhong, and I. Stoica, “Efficient coflow scheduling with varys”, in ACM SIGCOMM. ACM, 2014, pp. 443–454.
[7]
F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron, “Decentralized task-aware scheduling for data center networks”, in ACM SIGCOMM Computer Communication Review, vol. 44, no. 4. ACM, 2014, pp. 431–442.
[8]
M. Chowdhury and I. Stoica, “Efficient coflow scheduling without prior knowledge”, in ACM SIGCOMM Computer Communication Review, vol. 45, no. 4. ACM, 2015, pp. 393–406.
[10]
H. Susanto, H. Jin, and K. Chen, “Stream: Decentralized opportunistic inter-coflow scheduling for datacenter networks”, in Network Protocols (ICNP), 2016 IEEE 24th International Conference on. IEEE, 2016, pp. 1–10.
[11]
Z. Qiu, C. Stein, and Y. Zhong, “Minimizing the total weighted completion time of coflows in datacenter networks”, in Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures. ACM, 2015, pp. 294–303.
[12]
S. Khuller and M. Purohit, “Brief announcement: Improved approximation algorithms for scheduling co-flows”, in Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, 2016, pp. 239–240.
[13]
S. Luo, H. Yu, Y. Zhao, S. Wang, S. Yu, and L. Li, “Towards practical and near-optimal coflow scheduling for data center networks”, IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 11, pp. 3366–3380, 2016.
[14]
M. Shafiee and J. Ghaderi, “An improved bound for minimizing the total weighted completion time of coflows in datacenters”, arXiv preprint arXiv:, 2017.
[15]
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “Brief announcement: A new improved bound for coflow scheduling”, in Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, 2017.
[16]
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “V12: a scalable and flexible data center network”, in ACM SIGCOMM 2009.
[17]
A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano et al., “Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network”, in Proc. ACM SIGDC 2015. ACM, 2015, pp. 183–197.
[18]
T. A. Roemer, “A note on the complexity of the concurrent open shop problem”, Journal of scheduling, vol. 9, no. 4, pp. 389–396, 2006.
[19]
R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. R. Kan, “Optimization and approximation in deterministic sequencing and scheduling: a survey”, Annals of discrete mathematics, vol. 5, pp. 287–326, 1979.
[20]
J. Y.-T. Leung, H. Li, and M. Pinedo, “Order scheduling in an environment with dedicated resources in parallel”, Journal of Scheduling, vol. 8, no. 5, pp. 355–386, 2005.
[21]
A. Agnetis, H. Kellerer, G. Nicosia, and A. Pacifici, “Parallel dedicated machines scheduling with chain precedence constraints”, European Journal of Operational Research, vol. 221, no. 2, pp. 296–305, 2012.
[22]
M. Marcus and R. Ree, “Diagonals of doubly stochastic matrices”, The Quarterly Journal of Mathematics, vol. 10, no. 1, pp. 296–302, 1959.
[23]
F. Dufossé and B. Uçar, “Notes on birkhoff-von neumann decomposition of doubly stochastic matrices”, Linear Algebra and its Applications, vol. 497, pp. 108–115, 2016.
[24]
L. G. Khachiyan, “A polynomial algorithm in linear programming”, Ussr Computational Mathematics & Mathematical Physics, vol. 20, no. 80, pp. 1–3, 1979.
[25]
N. Karmarkar, “A new polynomial-time algorithm for linear programming”, in Proceedings of the sixteenth annual ACM symposium on Theory of computing ACM, 1984, pp. 302–311.
[26]
Y. Li, S. H.-C. Jiang, H. Tan, C. Zhang, G. Chen, J. Zhou, and F. Lau, “Efficient online coflow routing and scheduling”, in Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing. ACM, 2016, pp. 161–170.

Cited By

View all
  • (2022)Efficient flow scheduling in distributed deep learning training with echelon formationProceedings of the 21st ACM Workshop on Hot Topics in Networks10.1145/3563766.3564096(93-100)Online publication date: 14-Nov-2022
  • (2020)Towards High-Efficiency Data Centers via Job-Aware Network SchedulingProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404474(1-10)Online publication date: 17-Aug-2020

Index Terms

  1. Scheduling Cofiows of Multi-stage Jobs to Minimize the Total Weighted Job Completion Time
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        IEEE INFOCOM 2018 - IEEE Conference on Computer Communications
        Apr 2018
        2776 pages

        Publisher

        IEEE Press

        Publication History

        Published: 16 April 2018

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 21 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)Efficient flow scheduling in distributed deep learning training with echelon formationProceedings of the 21st ACM Workshop on Hot Topics in Networks10.1145/3563766.3564096(93-100)Online publication date: 14-Nov-2022
        • (2020)Towards High-Efficiency Data Centers via Job-Aware Network SchedulingProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404474(1-10)Online publication date: 17-Aug-2020

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media