Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleMarch 2024
Performance Comparison of Distributed DNN Training on Optical Versus Electrical Interconnect Systems
Algorithms and Architectures for Parallel ProcessingPages 401–418https://doi.org/10.1007/978-981-97-0834-5_23AbstractParallel and distributed Deep Neural Network (DNN) training have become integral in data centers, significantly reducing DNN training time. The interconnection type among nodes and the chosen all-reduce algorithm critically impact this speed-up. ...
- research-articleAugust 2023
DistSim: A performance model of large-scale hybrid distributed DNN training
- Guandong Lu,
- Runzhe Chen,
- Yakai Wang,
- Yangjie Zhou,
- Rui Zhang,
- Zheng Hu,
- Yanming Miao,
- Zhifang Cai,
- Li Li,
- Jingwen Leng,
- Minyi Guo
CF '23: Proceedings of the 20th ACM International Conference on Computing FrontiersPages 112–122https://doi.org/10.1145/3587135.3592200With the ever-increasing computational demand of DNN training workloads, distributed training has been widely adopted. A combination of data, model and pipeline parallelism strategy, called hybrid parallelism distributed training, is imported to tackle ...