[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3508352.3549435acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article
Open access

Seprox: Sequence-Based Approximations for Compressing Ultra-Low Precision Deep Neural Networks

Published: 22 December 2022 Publication History

Abstract

Compression techniques such as quantization and pruning are indispensable for deploying state-of-the-art Deep Neural Networks (DNNs) on resource-constrained edge devices. Quantization is widely used in practice - many commercial platforms already support 8-bits, with recent trends towards ultra-low precision (4-bits and below). Pruning, which increases network sparsity (incidence of zero-valued weights), enables compression by storing only the nonzero weights and their indices. Unfortunately, the compression benefits of pruning deteriorate or even vanish in ultra-low precision DNNs. This is due to (i) the unfavorable tradeoff between the number of bits needed to store a weight (which reduces with lower precision) and the number of bits needed to encode an index (which remains unchanged), and (ii) the lower sparsity levels that are achievable at lower precisions.
We propose Seprox, a new compression scheme that overcomes the aforementioned challenges by exploiting two key observations about ultra-low precision DNNs. First, with lower precision, fewer weight values are possible, leading to increased incidence of frequently-occurring weights and weight sequences. Second, some weight values occur rarely and can be eliminated by replacing them with similar values. Leveraging these insights, Seprox encodes frequently-occurring weight sequences (as opposed to individual weights) while using the eliminated weight values to encode them, thereby avoiding indexing overheads and achieving higher compression. Additionally, Seprox uses approximation techniques to increase the frequencies of the encoded sequences. Across six ultra-low precision DNNs trained on the Cifar10 and ImageNet datasets, Seprox achieves model compressions, energy improvements and speed-ups of up to 35.2%, 14.8% and 18.2% respectively.

References

[1]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248--255.
[2]
Mark Sandler et al. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018. pp. 4510--4520.
[3]
Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander M. Rush, Gu-Yeon Wei, and David Brooks. 2019. MASR: A Modular Accelerator for Sparse RNNs. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). pp. 1--14.
[4]
Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In 4th International Conference on Learning Representations, ICLR 2016,.
[5]
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770--778.
[6]
Jangho Kim, KiYoon Yoo, and Nojun Kwak. 2020. Position-Based Scaled Gradient for Model Quantization and Pruning. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS'20). Article 1714.
[7]
Byung Soo Ko. 2020. ImageNet Classification Leaderboard. https://kobiso.github.io/Computer-Vision-Leaderboard/imagenet.html
[8]
Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report.
[9]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS'12). pp. 1097--1105.
[10]
Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort. 2020. Up or down? Adaptive Rounding for Post-Training Quantization. In Proceedings of the 37th International Conference on Machine Learning (ICML'20). Article 667.
[11]
Angshuman Parashar et al. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). pp. 27--40.
[12]
Adam Paszke et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. pp. 8024--8035.
[13]
Jeff Pool, Abhishek Sawarkar, and Jay Rodge. 2021. Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT. https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/
[14]
J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee. 2004. Embedded deterministic test. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 23, 5 (2004), pp. 776--792.
[15]
Y. Saad. 2003. Iterative methods for sparse linear systems (second ed.). SIAM.
[16]
Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN Accelerator Simulator.
[17]
Sanchari Sen, Swagath Venkataramani, and Anand Raghunathan. 2021. Efficacy of Pruning in Ultra-Low Precision DNNs. In 2021 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). pp. 1--6.
[18]
Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yury Nahshan, Alex Bronstein, and Uri Weiser. 2020. Robust Quantization: One Model to Rule Them All. In Advances in Neural Information Processing Systems, Vol. 33. pp. 5308--5317.
[19]
Xiao Sun et al. 2020. Ultra-Low Precision 4-bit Training of Deep Neural Networks. In Advances in Neural Information Processing Systems, Vol. 33. pp. 1796--1807.
[20]
Jan van Leeuwen. 1976. On the Construction of Huffman Trees. In International Colloquium on Automata, Languages and Programming (ICALP). pp. 382--410.
[21]
Simon Wiedemann, Klaus-Robert Müller, and Wojciech Samek. 2020. Compact and Computationally Efficient Representation of Deep Neural Networks. IEEE Transactions on Neural Networks and Learning Systems 31, 3 (2020), pp. 772--785.
[22]
Michael Zhu and Suyog Gupta. 2018. To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression. In 6th International Conference on Learning Representations, ICLR 2018,.

Index Terms

  1. Seprox: Sequence-Based Approximations for Compressing Ultra-Low Precision Deep Neural Networks
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
          October 2022
          1467 pages
          ISBN:9781450392174
          DOI:10.1145/3508352
          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Sponsors

          In-Cooperation

          • IEEE-EDS: Electronic Devices Society
          • IEEE CAS
          • IEEE CEDA

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 22 December 2022

          Check for updates

          Qualifiers

          • Research-article

          Conference

          ICCAD '22
          Sponsor:
          ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
          October 30 - November 3, 2022
          California, San Diego

          Acceptance Rates

          Overall Acceptance Rate 457 of 1,762 submissions, 26%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 305
            Total Downloads
          • Downloads (Last 12 months)130
          • Downloads (Last 6 weeks)24
          Reflects downloads up to 30 Dec 2024

          Other Metrics

          Citations

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media