[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3426745.3431334acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

Huffman Coding Based Encoding Techniques for Fast Distributed Deep Learning

Published: 01 December 2020 Publication History

Abstract

Distributed stochastic algorithms, equipped with gradient compression techniques, such as codebook quantization, are becoming increasingly popular and considered state-of-the-art in training large deep neural network (DNN) models. However, communicating the quantized gradients in a network requires efficient encoding techniques. For this, practitioners generally use Elias encoding-based techniques without considering their computational overhead or data-volume. In this paper, based on Huffman coding, we propose several lossless encoding techniques that exploit different characteristics of the quantized gradients during distributed DNN training. Then, we show their effectiveness on 5 different DNN models across three different data-sets, and compare them with classic state-of-the-art Elias-based encoding techniques. Our results show that the proposed Huffman-based encoders (i.e., RLH, SH, and SHS) can reduce the encoded data-volume by up to 5.1×, 4.32×, and 3.8×, respectively, compared to the Elias-based encoders.

Supplementary Material

MP4 File (3426745.3431334.mp4)
Communicating the quantized gradients in a network requires efficient encoding techniques. For this, practitioners generally use Elias encoding-based techniques without considering their computational overhead or data-volume. In our work, based on Huffman coding, we propose several lossless encoding techniques that exploit different characteristics of the quantized gradients during distributed DNN training. We also show their effectiveness on 5 different DNN models across three different data-sets and compare them with classic state-of-the-art Elias-based encoding techniques. Our results show that the proposed Huffman-based encoders (i.e., RLH, SH, and SHS) can reduce the encoded data-volume by up to 5.1×,4.32×, and 3.8×, respectively, compared to the Elias-based encoders. In this presentation, we provide an overview of our work.

References

[1]
A. F. Aji and K. Heafield. 2017. Sparse Communication for Distributed Gradient Descent. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 440--445.
[2]
D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic. 2017. QSGD: Communication-efficient SGD via gradient quantization and encoding. In NeurIPS. 1709--1720.
[3]
Shashwat Banchhor, Rishikesh Gajjala, Yogish Sabharwal, and Sandeep Sen. 2020. Decode efficient prefix codes. CoRR abs/2010.05005 (2020). arXiv:2010.05005 https://arxiv.org/abs/2010.05005
[4]
D. Basu, D. Data, C. Karakus, and S. Diggavi. 2019. Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations. In NeurIPS.
[5]
R. Bekkerman, M. Bilenko, and J. Langford. 2011. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press.
[6]
J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and A. Anandkumar. 2018. SIGNSGD: Compressed Optimisation for Non-Convex Problems. In International Conference on Machine Learning (ICML). 559--568.
[7]
Y. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.
[8]
Y. Choi, M. El-Khamy, and J. Lee. 2020. Universal Deep Neural Network Compression. IEEE Journal of Selected Topics in Signal Processing (2020), 1--1.
[9]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (third ed.). The MIT Press.
[10]
Thomas M. Cover and Joy A. Thomas. 2006. Elements of information theory (2nd. ed.). Wiley.
[11]
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, Q. V. Le, and A. Y. Ng. 2012. Large Scale Distributed Deep Networks. In NeurIPS. 1223--1231.
[12]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR.
[13]
A. Dutta, E. H. Bergou, A. M. Abdelmoniem, C.-Y. Ho, A. N. Sahu, M. Canini, and P. Kalnis. 2020. On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning. In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). 3817--3824.
[14]
P. Elias. 1975. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 21, 2 (1975), 194--203.
[15]
R. Gallager and D. van Voorhis. 1975. Optimal source codes for geometrically distributed integer alphabets (Corresp.). IEEE Transactions on Information Theory 21, 2 (March 1975), 228--230. https://doi.org/10.1109/TIT.1975.1055357
[16]
Shiming Ge, Zhao Luo, Shengwei Zhao, Xin Jin, and Xiao-Yu Zhang. 2017. Compressing deep neural networks for efficient visual inference. In IEEE International Conference on Multimedia and Expo (ICME). IEEE, 667--672.
[17]
Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
[18]
R. Hashemian. 1995. Memory efficient and high-speed search Huffman coding. IEEE Transactions on Communications 43, 10 (1995), 2576--2581.
[19]
K. He, X. Zhang, S. Ren, and J. Sun. 2015. Deep Residual Learning for Image Recognition. In CVPR.
[20]
S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Computing 9, 8 (1997).
[21]
Samuel Horváth, Chen-Yu Ho, Ludovit Horvath, Atal Narayan Sahu, Marco Canini, and Peter Richtarik. 2019. Natural Compression for Distributed Deep Learning. arXiv preprint arXiv:1905.10988 (2019).
[22]
D. A. Huffman. 1952. A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE 40, 9 (1952), 1098--1101.
[23]
Jiawei Jiang, Fangcheng Fu, Tong Yang, Yingxia Shao, and Bin Cui. 2020. SKCompress: compressing sparse and nonuniform gradient in distributed machine learning. The VLDB Journal (2020), 1--28.
[24]
H. Kaiming, Z. Xiangyu, R. Shaoqing, and S. Jian. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[25]
Michael Kohn. 2005. Huffman/CCITT Compression In TIFF. (2005). https://www.mikekohn.net/file_formats/tiff.php
[26]
A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. Technical report, University of Toronto 1, 4 (2009).
[27]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NeurIPS. 1097--1105.
[28]
Lawrence L. Larmore and Teresa M. Przytycka. 1995. Constructing Huffman Trees in Parallel. SIAM J. Comput. 24, 6 (1995), 1163--1169.
[29]
Y. Li, J. Park, M. Alian, Y. Yuan, Z. Qu, P. Pan, R. Wang, A. Schwing, H. Esmaeilzadeh, and N. S. Kim. 2018. A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks. In IEEE/ACM International Symposium on Micro-architecture (MICRO). 175--188.
[30]
Y. Lin, S. Han, H. Mao, Y. Wang, and W. Dally. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. In International conference on Learning Representation (ICLR).
[31]
M. P. Marcus, B. Santorini, M. A. Marcinkiewicz, and A. Taylor. 1999. Treebank-3. (1999). https://catalog.ldc.upenn.edu/LDC99T42.
[32]
A. Moffat and A. Turpin. 1997. On the implementation of minimum redundancy prefix codes. IEEE Transactions on Communications 45, 10 (1997), 1200--1207.
[33]
R. A. Patel, Y. Zhang, J. Mak, A. Davidson, and J. D. Owens. 2012. Parallel lossless data compression on the GPU. In Innovative Parallel Computing (InPar). 1--9.
[34]
Yanghua Peng, Yibo Zhu, Yangrui Chen, Yixin Bao, Bairen Yi, Chang Lan, Chuan Wu, and Chuanxiong Guo. 2019. A Generic Communication Scheduler for Distributed DNN Training Acceleration. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP).
[35]
Pytorch.org. 2019. PyTorch. (2019). https://pytorch.org/
[36]
A. H. Robinson and C. Cherry. 1967. Results of a prototype television bandwidth compression scheme. Proc. IEEE 55, 3 (1967), 356--364.
[37]
F. Sattler, Simon Wiedemann, K-R Müller, and W. Samek. 2019. Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication. In International Joint Conference on Neural Networks, IJCNN. 1--8.
[38]
Benjamin Schlegel, Rainer Gemulla, and Wolfgang Lehner. 2010. Fast Integer Compression Using SIMD Instructions. In Proceedings of the Sixth International Workshop on Data Management on New Hardware (DaMoN).
[39]
Jürgen Schmidhuber and Stefan Heil. 1995. Predictive coding with neural nets: Application to text compression. In NeurIPS. 1047--1054.
[40]
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018).
[41]
K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR).
[42]
S. U. Stich, J. B. Cordonnier, and M. Jaggi. 2018. Sparsified SGD with memory. In NeurIPS. 4447--4458.
[43]
N. Strom. 2015. Scalable distributed DNN training using commodity GPU cloud computing. In INTERSPEECH. 1488--1492.
[44]
B. Sukhwani, B. Abali, B. Brezzo, and S. Asaad. 2011. High-Throughput, Lossless Data Compresion on FPGAs. In IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. 113--116.
[45]
H. Sun, Y. Shao, J. Jiang, B. Cui, K. Lei, Y. Xu, and J. Wang. 2019. Sparse Gradient Compression for Distributed SGD. In Database Systems for Advanced Applications. 139--155.
[46]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going Deeper with Convolutions. In Computer Vision and Pattern Recognition (CVPR). 1--9.
[47]
T. Vogels, S. P. Karimireddy, and M. Jaggi. 2019. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization. NeurIPS.
[48]
H. Wang, S. Sievert, S. Liu, Z. Charles, D. Papailiopoulos, and S. Wright. 2018. ATOMO: Communication-efficient Learning via Atomic Sparsification. In NeurIPS. 9850--9861.
[49]
W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, and H. Li. 2017. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. In NeurIPS. 1508--1518.
[50]
H. Xu, C.-Y. Ho, A. M. Abdelmoniem, A. Dutta, E. H. Bergou, K. Karatsenidis, M. Canini, and P. Kalnis. 2020. Compressed Communication for Distributed Deep Learning: Survey and Quantitative Evaluation. Technical Report. KAUST. http://hdl.handle.net/10754/631179.
[51]
Yue Yu, Jiaxiang Wu, and Junzhou Huang. 2019. Exploring Fast and Communication-Efficient Algorithms in Large-Scale Distributed Networks. In AISTATS.

Cited By

View all
  • (2024)Flexible Quantization for Efficient Convolutional Neural NetworksElectronics10.3390/electronics1310192313:10(1923)Online publication date: 14-May-2024
  • (2024)On the Method of Lossless Data Compression Using Spans of Varied Bit Widths2024 47th International Conference on Telecommunications and Signal Processing (TSP)10.1109/TSP63128.2024.10605976(115-118)Online publication date: 10-Jul-2024
  • (2024)Joint Online Optimization of Model Training and Analog Aggregation for Wireless Edge LearningIEEE/ACM Transactions on Networking10.1109/TNET.2023.331847432:2(1212-1228)Online publication date: Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine Learning
December 2020
46 pages
ISBN:9781450381826
DOI:10.1145/3426745
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Distributed training
  2. Elias and Run-length Encoding
  3. Gradient compression
  4. Huffman coding
  5. Quantization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CoNEXT '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 5 of 10 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)100
  • Downloads (Last 6 weeks)13
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Flexible Quantization for Efficient Convolutional Neural NetworksElectronics10.3390/electronics1310192313:10(1923)Online publication date: 14-May-2024
  • (2024)On the Method of Lossless Data Compression Using Spans of Varied Bit Widths2024 47th International Conference on Telecommunications and Signal Processing (TSP)10.1109/TSP63128.2024.10605976(115-118)Online publication date: 10-Jul-2024
  • (2024)Joint Online Optimization of Model Training and Analog Aggregation for Wireless Edge LearningIEEE/ACM Transactions on Networking10.1109/TNET.2023.331847432:2(1212-1228)Online publication date: Apr-2024
  • (2024)Low-Complexity Vector Source Coding for Discrete Long Sequences with Unknown DistributionsICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447294(8991-8995)Online publication date: 14-Apr-2024
  • (2022)Learned Gradient Compression for Distributed Deep LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.308480633:12(7330-7344)Online publication date: Dec-2022
  • (2022)Inter-Operability of Compression Techniques for Efficient Deployment of CNNs on MicrocontrollersAdvances in System-Integrated Intelligence10.1007/978-3-031-16281-7_51(543-552)Online publication date: 4-Sep-2022
  • (2022)Adaptive synchronous strategy for distributed machine learningInternational Journal of Intelligent Systems10.1002/int.2306037:12(11713-11741)Online publication date: 20-Sep-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media