More Web Proxy on the site http://driver.im/

research-article

Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural Networks

Authors:

Myeongjae Jang,

Soontae KimAuthors Info & Claims

IEEE Transactions on Computers, Volume 73, Issue 1

Pages 249 - 262

https://doi.org/10.1109/TC.2023.3315051

Published: 01 January 2024 Publication History

Abstract

Convolutional neural networks are normally used in systems with dedicated neural processing units for CNN-related computations. For high performance and low hardware overheads, CNN datatype quantization is applied. As an additional optimization, to further reduce DRAM accesses, compression algorithms have been used for CNN data. However, conventional zero value-aware compression algorithms suffer from a reduction in compression ratio with the latest quantized CNNs, owing to the small number of zero values. Moreover, the appropriate zero run-length code width can be changed dynamically based on the CNNs, layers, and quantization datatypes. As another compressible data value for increasing the compression ratio, the latest quantized CNNs have many narrow-width values. Because low-precision quantization reduces the data bit width, CNN data are gathered into a few discrete values and incur a biased data distribution. These discrete values become narrow-width values, and constitute a large proportion of the biased distribution. In this article, we propose an efficient compression algorithm for quantized CNNs, ENCORE, which utilizes variable zero run-length encoding and compresses narrow-width values. With the latest quantized CNNs, ENCORE shows higher compression ratios, 93.55% and 50.85% in Mobilenet v1 and Tiny YOLO v3, respectively, than conventional zero value-aware CNN data compression algorithms.

References

[1]

J.-S. Park et al., “A multi-mode 8k-MAC HW-utilization-aware neural processing unit with a unified multi-precision datapath in 4nm flagship mobile SoC,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), vol. 65. Piscataway, NJ, USA: IEEE, 2022, pp. 246–248.

[2]

Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 52, no. 1, pp. 127–138, Jan. 2017.

[3]

Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 9, no. 2, pp. 292–308, Jun. 2019.

[4]

S. Han et al., “EIE: Efficient inference engine on compressed deep neural network,” ACM SIGARCH Comput. Archit. News, vol. 44, no. 3, pp. 243–254, 2016.

Digital Library

[5]

S. Hashemi, N. Anthony, H. Tann, R. I. Bahar, and S. Reda, “Understanding the impact of precision quantization on the accuracy and energy of neural networks,” in Proc. Des., Automat. Test Europe Conf. Exhib. (DATE). Piscataway, NJ, USA: IEEE, 2017, pp. 1474–1479.

[6]

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 6869–6898, 2017.

Digital Library

[7]

K. Lee, D. Kang, D. Kang, and S. Ha, “Analysis of the effect of off-chip memory access on the performance of an NPU system,” in Proc. 23rd Int. Symp. Qual. Electron. Des. (ISQED). Piscataway, NJ, USA: IEEE, 2022, pp. 13–18.

[8]

Z. Li et al., “RRAM-DNN: An RRAM and model-compression empowered all-weights-on-chip DNN accelerator,” IEEE J. Solid-State Circuits, vol. 56, no. 4, pp. 1105–1115, Apr. 2021.

[9]

Y. Park, Y. Kang, S. Kim, E. Kwon, and S. Kang, “GRLC: Grid-based run-length compression for energy-efficient CNN accelerator,” in Proc. ACM/IEEE Int. Symp. Low Power Electron. Des., 2020, pp. 91–96.

[10]

S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural networks,” 2015,.

[11]

E. Park, D. Kim, and S. Yoo, “Energy-efficient neural network accelerator based on outlier-aware low-precision computation,” in Proc. ACM/IEEE 45th Int. Symp. Comput. Archit. Piscataway, NJ, USA: IEEE, 2018, pp. 688–698.

Digital Library

[12]

E. Park, S. Yoo, and P. Vajda, “Value-aware quantization for training and inference of neural networks,” in Proc. Eur. Conf. Comput. Vision (ECCV), 2018, pp. 580–595.

[13]

H.-H. Chin, R.-S. Tsay, and H.-I. Wu, “A high-performance adaptive quantization approach for edge CNN applications,” 2021,.

[14]

Z. Huang, W. Shao, X. Wang, L. Lin, and P. Luo, “Rethinking the pruning criteria for convolutional neural network,” in Proc. Adv. Neural Inf. Process. Syst., 2021, vol. 34, pp. 16305–16318.

[15]

I. Bellido and E. Fiesler, “Do backpropagation trained neural networks have normal weight distributions?” in Proc. Int. Conf. Artif. Neural Netw. (ICANN), Amsterdam, The Netherlands. London, U.K.: Springer, Sep. 13–16, 1993, pp. 772–775.

[16]

J. Kim, S. Hong, J. Hong, and S. Kim, “CID: Co-architecting instruction cache and decompression system for embedded systems,” IEEE Trans. Comput., vol. 70, no. 7, pp. 1132–1145, Jul. 2021.

[17]

J. Kim, M. Kang, J. Hong, and S. Kim, “Exploiting inter-block entropy to enhance the compressibility of blocks with diverse data,” in Proc. IEEE Int. Symp. High-Perform. Comput. Archit. (HPCA). Piscataway, NJ, USA: IEEE, 2022, pp. 1100–1114.

[18]

J. Hong, H. Kim, and S. Kim, “EAR: ECC-aided refresh reduction through 2-D zero compression,” in Proc. 27th Int. Conf. Parallel Archit. Compilation Techn., 2018, pp. 1–11.

[19]

J. Hong and S. Kim, “ECC string: Flexible ECC management for low-cost error protection of L2 caches,” in Proc. IEEE 30th Int. Conf. Comput. Des. (ICCD). Piscataway, NJ, USA: IEEE, 2012, pp. 512–513.

[20]

S. Kim, J. Lee, J. Kim, and S. Hong, “Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits,” in Proc. 44th Annu. IEEE/ACM Int. Symp. Microarchit., 2011, pp. 420–429.

[21]

M. Jang, J. Kim, J. Kim, and S. Kim, “Encore compression: Exploiting narrow-width values for quantized deep neural networks,” in Proc. Des., Automat. Test Europe Conf. Exhib. (DATE). Piscataway, NJ, USA: IEEE, 2022, pp. 1503–1508.

[22]

A. Alameldeen and D. Wood, “Frequent pattern compression: A significance-based compression scheme for L2 caches,” Dept. Comput. Sci., Univ. Wisconsin-Madison, Madison, WI, USA, Tech. Rep. TR1500, 2004.

[23]

M. Nagel, M. Fournarakis, R. A. Amjad, Y. Bondarenko, M. Van Baalen, and T. Blankevoort, “A white paper on neural network quantization,” 2021,.

[24]

J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” 2018,.

[25]

P. Nayak, D. Zhang, and S. Chai, “Bit efficient quantization for deep neural networks,” in Proc. 5th Workshop Energy Efficient Mach. Learn. Cogn. Comput.-NeurIPS Ed. (EMC2-NIPS). Piscataway, NJ, USA: IEEE, 2019, pp. 52–56.

[26]

F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” 2016,.

[27]

A. Rodriguez et al., “Lower numerical precision deep learning inference and training,” Intel White Paper, vol. 3, no. 1, pp. 1–19, 2018.

[28]

S. Migacz, “8-bit inference with TensorRT,” in Proc. GPU Technol. Conf., 2017, vol. 2, no. 4, p. 5.

[29]

L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.

[30]

A. Chandra and K. Chakrabarty, “Frequency-directed run-length (FDR) codes with application to system-on-a-chip test data compression,” in Proc. 19th IEEE VLSI Test Symp. (VTS). Piscataway, NJ, USA: IEEE, 2001, pp. 42–47.

[31]

A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” 2017,.

[32]

J. Trein, A. T. Schwarzbacher, B. Hoppe, and K.-H. Noffz, “A hardware implementation of a run length encoding compression algorithm with parallel inputs,” in IET Irish Signals Syst. Conf. (ISSC). Galway, Ireland: IET, 2008, pp. 337–342.

[33]

T. Developers, “Tensorflow,” Zenodo, 2021. [Online]. Available: https://www.tensorflow.org/

[34]

A. Krizhevsky, “Learning multiple layers of features from tiny images,” Master's thesis, University of Toronto, Toronto, ON, Canada, 2009.

[35]

J. Redmon. “Darknet: Open source neural networks in C.” Pjreddie. Accessed: 2013. [Online]. Available: http://pjreddie.com/darknet/

[36]

T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. 13th Eur. Conf. Comput. Vision (ECCV), in Lecture Notes in Computer Science, Zurich, Switzerland, vol. 8693. Cham, Germany: Springer, Sep. 6–12, 2014, pp. 740–755.

[37]

H. Kwon, A. Samajdar, and T. Krishna, “MAERI: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects,” ACM SIGPLAN Notices, vol. 53, no. 2, pp. 461–475, 2018.

Digital Library

[38]

F. Mu∼noz-Martínez, J. L. Abellán, M. E. Acacio, and T. Krishna, “STONNE: Enabling cycle-level microarchitectural simulation for DNN inference accelerators,” in Proc. IEEE Int. Symp. Workload Characterization (IISWC). Piscataway, NJ, USA: IEEE, 2021, pp. 201–213.

[39]

S. Li, Z. Yang, D. Reddy, A. Srivastava, and B. Jacob, “DRAMsim3: A cycle-accurate, thermal-capable DRAM simulator,” IEEE Comput. Archit. Lett., vol. 19, no. 2, pp. 106–109, Jul./Dec. 2020.

Digital Library

[40]

M. Pellauer et al., “Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration,” in Proc. 24th Int. Conf. Archit. Support Program. Lang. Oper. Syst., 2019, pp. 137–151.

[41]

N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, “Cacti 6.0: A tool to model large caches,” HP Lab., vol. 27, p. 28, Apr. 2009.

[42]

Y. N. Wu, J. S. Emer, and V. Sze, “Accelergy: An architecture-level energy estimation methodology for accelerator designs,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des. (ICCAD). Piscataway, NJ, USA: IEEE, 2019, pp. 1–8.

[43]

J. Bachrach et al., “Chisel: Constructing hardware in a Scala embedded language,” in Proc. 49th Annu. Des. Automat. Conf., 2012, pp. 1216–1225.

Digital Library

[44]

P. Kurup and T. Abbasi, Logic Synthesis Using Synopsys®. Norwell, MA, USA: Kluwer Academic Puslishers, 2012.

Digital Library

[45]

V. J. Reddi et al., “MLPerf inference benchmark,” in Proc. ACM/IEEE 47th Annu. Int. Symp. Comput. Archit. (ISCA). Piscataway, NJ, USA: IEEE, 2020, pp. 446–459.

Index Terms

Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural Networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Simulating quantized inference on convolutional neural networks
Abstract
Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training ...
Highlights
- Simulation of fixed-point quantization inference in convolutional neural networks.
Lossless Text Compression Using Recurrent Neural Networks
Abstract
Lossless Data compression is the process of reducing the size or the number of bits required to represent data, and Arithmetic coding is one of the popular lossless text compression techniques. This project focuses on lossless data compression ...
MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks
Abstract
Quantization, which involves bit-width reduction, is considered as one of the most effective approaches to rapidly and energy-efficiently deploy deep convolutional neural networks (DCNNs) on resource-constrained embedded hardware. However, bit-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 73, Issue 1

Jan. 2024

300 pages

Issue’s Table of Contents

0018-9340 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 January 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents