[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3293883.3302260acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
tutorial

High performance distributed deep learning: a beginner's guide

Published: 16 February 2019 Publication History

Abstract

The current wave of advances in Deep Learning (DL) has led to many exciting challenges and opportunities for Computer Science and Artificial Intelligence researchers alike. Modern DL frameworks like Caffe2, TensorFlow, Cognitive Toolkit (CNTK), PyTorch, and several others have emerged that offer ease of use and flexibility to describe, train, and deploy various types of Deep Neural Networks (DNNs). In this tutorial, we will provide an overview of interesting trends in DNN design and how cutting-edge hardware architectures are playing a key role in moving the field forward. We will also present an overview of different DNN architectures and DL frameworks. Most DL frameworks started with a single-node/single-GPU design. However, approaches to parallelize the process of DNN training are also being actively explored. The DL community has moved along different distributed training designs that exploit communication runtimes like gRPC, MPI, and NCCL. In this context, we will highlight new challenges and opportunities for communication runtimes to efficiently support distributed DNN training. We also highlight some of our co-design efforts to utilize CUDA-Aware MPI for large-scale DNN training on modern GPU clusters. Finally, we include hands-on exercises in this tutorial to enable the attendees to gain first-hand experience of running distributed DNN training experiments on a modern GPU cluster.

References

[1]
2018. ImageNet. http://image-net.org/. {Online; accessed January 3, 2019}.
[2]
2018. PyTorch. https://pytorch.org/. {Online; accessed January 3, 2019}.
[3]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, 2015. arXiv preprint arXiv:1603.04467 (2016).
[4]
Facebook. 2018. Caffe2 Framework. https://caffe2.ai.
[5]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).
[6]
A. Krizhevsky andG. Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
[7]
Microsoft. 2018. CNTK. http://www.cntk.ai/. {Online; accessed January 3, 2019}.
[8]
Preferred Networks. 2018. Chainer: A flexible framework for neural networks. https://chainer.org/. {Online; accessed January 3, 2019}.

Cited By

View all
  • (2021)AI-Enabled Efficient and Safe Food Supply ChainElectronics10.3390/electronics1011122310:11(1223)Online publication date: 21-May-2021
  • (2021)LAGA: Lagged AllReduce with Gradient Accumulation for Minimal Idle Time2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00027(171-180)Online publication date: Dec-2021
  • (2020)Performance benchmarking of deep learning framework on Intel Xeon PhiThe Journal of Supercomputing10.1007/s11227-020-03362-3Online publication date: 17-Jun-2020
  • Show More Cited By
  1. High performance distributed deep learning: a beginner's guide

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming
    February 2019
    472 pages
    ISBN:9781450362252
    DOI:10.1145/3293883
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 February 2019

    Check for updates

    Author Tags

    1. DNN training
    2. HPC
    3. MPI
    4. high-performance deep learning
    5. machine learning

    Qualifiers

    • Tutorial

    Conference

    PPoPP '19

    Acceptance Rates

    PPoPP '19 Paper Acceptance Rate 29 of 152 submissions, 19%;
    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 19 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)AI-Enabled Efficient and Safe Food Supply ChainElectronics10.3390/electronics1011122310:11(1223)Online publication date: 21-May-2021
    • (2021)LAGA: Lagged AllReduce with Gradient Accumulation for Minimal Idle Time2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00027(171-180)Online publication date: Dec-2021
    • (2020)Performance benchmarking of deep learning framework on Intel Xeon PhiThe Journal of Supercomputing10.1007/s11227-020-03362-3Online publication date: 17-Jun-2020
    • (2019)PowerSGDProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455565(14269-14278)Online publication date: 8-Dec-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media