[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural Networks

Published: 01 January 2024 Publication History

Abstract

Convolutional neural networks are normally used in systems with dedicated neural processing units for CNN-related computations. For high performance and low hardware overheads, CNN datatype quantization is applied. As an additional optimization, to further reduce DRAM accesses, compression algorithms have been used for CNN data. However, conventional zero value-aware compression algorithms suffer from a reduction in compression ratio with the latest quantized CNNs, owing to the small number of zero values. Moreover, the appropriate zero run-length code width can be changed dynamically based on the CNNs, layers, and quantization datatypes. As another compressible data value for increasing the compression ratio, the latest quantized CNNs have many narrow-width values. Because low-precision quantization reduces the data bit width, CNN data are gathered into a few discrete values and incur a biased data distribution. These discrete values become narrow-width values, and constitute a large proportion of the biased distribution. In this article, we propose an efficient compression algorithm for quantized CNNs, ENCORE, which utilizes variable zero run-length encoding and compresses narrow-width values. With the latest quantized CNNs, ENCORE shows higher compression ratios, 93.55% and 50.85% in Mobilenet v1 and Tiny YOLO v3, respectively, than conventional zero value-aware CNN data compression algorithms.

References

[1]
J.-S. Park et al., “A multi-mode 8k-MAC HW-utilization-aware neural processing unit with a unified multi-precision datapath in 4nm flagship mobile SoC,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), vol. 65. Piscataway, NJ, USA: IEEE, 2022, pp. 246–248.
[2]
Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 52, no. 1, pp. 127–138, Jan. 2017.
[3]
Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 9, no. 2, pp. 292–308, Jun. 2019.
[4]
S. Han et al., “EIE: Efficient inference engine on compressed deep neural network,” ACM SIGARCH Comput. Archit. News, vol. 44, no. 3, pp. 243–254, 2016.
[5]
S. Hashemi, N. Anthony, H. Tann, R. I. Bahar, and S. Reda, “Understanding the impact of precision quantization on the accuracy and energy of neural networks,” in Proc. Des., Automat. Test Europe Conf. Exhib. (DATE). Piscataway, NJ, USA: IEEE, 2017, pp. 1474–1479.
[6]
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 6869–6898, 2017.
[7]
K. Lee, D. Kang, D. Kang, and S. Ha, “Analysis of the effect of off-chip memory access on the performance of an NPU system,” in Proc. 23rd Int. Symp. Qual. Electron. Des. (ISQED). Piscataway, NJ, USA: IEEE, 2022, pp. 13–18.
[8]
Z. Li et al., “RRAM-DNN: An RRAM and model-compression empowered all-weights-on-chip DNN accelerator,” IEEE J. Solid-State Circuits, vol. 56, no. 4, pp. 1105–1115, Apr. 2021.
[9]
Y. Park, Y. Kang, S. Kim, E. Kwon, and S. Kang, “GRLC: Grid-based run-length compression for energy-efficient CNN accelerator,” in Proc. ACM/IEEE Int. Symp. Low Power Electron. Des., 2020, pp. 91–96.
[10]
S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural networks,” 2015,.
[11]
E. Park, D. Kim, and S. Yoo, “Energy-efficient neural network accelerator based on outlier-aware low-precision computation,” in Proc. ACM/IEEE 45th Int. Symp. Comput. Archit. Piscataway, NJ, USA: IEEE, 2018, pp. 688–698.
[12]
E. Park, S. Yoo, and P. Vajda, “Value-aware quantization for training and inference of neural networks,” in Proc. Eur. Conf. Comput. Vision (ECCV), 2018, pp. 580–595.
[13]
H.-H. Chin, R.-S. Tsay, and H.-I. Wu, “A high-performance adaptive quantization approach for edge CNN applications,” 2021,.
[14]
Z. Huang, W. Shao, X. Wang, L. Lin, and P. Luo, “Rethinking the pruning criteria for convolutional neural network,” in Proc. Adv. Neural Inf. Process. Syst., 2021, vol. 34, pp. 16305–16318.
[15]
I. Bellido and E. Fiesler, “Do backpropagation trained neural networks have normal weight distributions?” in Proc. Int. Conf. Artif. Neural Netw. (ICANN), Amsterdam, The Netherlands. London, U.K.: Springer, Sep. 13–16, 1993, pp. 772–775.
[16]
J. Kim, S. Hong, J. Hong, and S. Kim, “CID: Co-architecting instruction cache and decompression system for embedded systems,” IEEE Trans. Comput., vol. 70, no. 7, pp. 1132–1145, Jul. 2021.
[17]
J. Kim, M. Kang, J. Hong, and S. Kim, “Exploiting inter-block entropy to enhance the compressibility of blocks with diverse data,” in Proc. IEEE Int. Symp. High-Perform. Comput. Archit. (HPCA). Piscataway, NJ, USA: IEEE, 2022, pp. 1100–1114.
[18]
J. Hong, H. Kim, and S. Kim, “EAR: ECC-aided refresh reduction through 2-D zero compression,” in Proc. 27th Int. Conf. Parallel Archit. Compilation Techn., 2018, pp. 1–11.
[19]
J. Hong and S. Kim, “ECC string: Flexible ECC management for low-cost error protection of L2 caches,” in Proc. IEEE 30th Int. Conf. Comput. Des. (ICCD). Piscataway, NJ, USA: IEEE, 2012, pp. 512–513.
[20]
S. Kim, J. Lee, J. Kim, and S. Hong, “Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits,” in Proc. 44th Annu. IEEE/ACM Int. Symp. Microarchit., 2011, pp. 420–429.
[21]
M. Jang, J. Kim, J. Kim, and S. Kim, “Encore compression: Exploiting narrow-width values for quantized deep neural networks,” in Proc. Des., Automat. Test Europe Conf. Exhib. (DATE). Piscataway, NJ, USA: IEEE, 2022, pp. 1503–1508.
[22]
A. Alameldeen and D. Wood, “Frequent pattern compression: A significance-based compression scheme for L2 caches,” Dept. Comput. Sci., Univ. Wisconsin-Madison, Madison, WI, USA, Tech. Rep. TR1500, 2004.
[23]
M. Nagel, M. Fournarakis, R. A. Amjad, Y. Bondarenko, M. Van Baalen, and T. Blankevoort, “A white paper on neural network quantization,” 2021,.
[24]
J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” 2018,.
[25]
P. Nayak, D. Zhang, and S. Chai, “Bit efficient quantization for deep neural networks,” in Proc. 5th Workshop Energy Efficient Mach. Learn. Cogn. Comput.-NeurIPS Ed. (EMC2-NIPS). Piscataway, NJ, USA: IEEE, 2019, pp. 52–56.
[26]
F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” 2016,.
[27]
A. Rodriguez et al., “Lower numerical precision deep learning inference and training,” Intel White Paper, vol. 3, no. 1, pp. 1–19, 2018.
[28]
S. Migacz, “8-bit inference with TensorRT,” in Proc. GPU Technol. Conf., 2017, vol. 2, no. 4, p. 5.
[29]
L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.
[30]
A. Chandra and K. Chakrabarty, “Frequency-directed run-length (FDR) codes with application to system-on-a-chip test data compression,” in Proc. 19th IEEE VLSI Test Symp. (VTS). Piscataway, NJ, USA: IEEE, 2001, pp. 42–47.
[31]
A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” 2017,.
[32]
J. Trein, A. T. Schwarzbacher, B. Hoppe, and K.-H. Noffz, “A hardware implementation of a run length encoding compression algorithm with parallel inputs,” in IET Irish Signals Syst. Conf. (ISSC). Galway, Ireland: IET, 2008, pp. 337–342.
[33]
T. Developers, “Tensorflow,” Zenodo, 2021. [Online]. Available: https://www.tensorflow.org/
[34]
A. Krizhevsky, “Learning multiple layers of features from tiny images,” Master's thesis, University of Toronto, Toronto, ON, Canada, 2009.
[35]
J. Redmon. “Darknet: Open source neural networks in C.” Pjreddie. Accessed: 2013. [Online]. Available: http://pjreddie.com/darknet/
[36]
T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. 13th Eur. Conf. Comput. Vision (ECCV), in Lecture Notes in Computer Science, Zurich, Switzerland, vol. 8693. Cham, Germany: Springer, Sep. 6–12, 2014, pp. 740–755.
[37]
H. Kwon, A. Samajdar, and T. Krishna, “MAERI: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects,” ACM SIGPLAN Notices, vol. 53, no. 2, pp. 461–475, 2018.
[38]
F. Mu∼noz-Martínez, J. L. Abellán, M. E. Acacio, and T. Krishna, “STONNE: Enabling cycle-level microarchitectural simulation for DNN inference accelerators,” in Proc. IEEE Int. Symp. Workload Characterization (IISWC). Piscataway, NJ, USA: IEEE, 2021, pp. 201–213.
[39]
S. Li, Z. Yang, D. Reddy, A. Srivastava, and B. Jacob, “DRAMsim3: A cycle-accurate, thermal-capable DRAM simulator,” IEEE Comput. Archit. Lett., vol. 19, no. 2, pp. 106–109, Jul./Dec. 2020.
[40]
M. Pellauer et al., “Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration,” in Proc. 24th Int. Conf. Archit. Support Program. Lang. Oper. Syst., 2019, pp. 137–151.
[41]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, “Cacti 6.0: A tool to model large caches,” HP Lab., vol. 27, p. 28, Apr. 2009.
[42]
Y. N. Wu, J. S. Emer, and V. Sze, “Accelergy: An architecture-level energy estimation methodology for accelerator designs,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des. (ICCAD). Piscataway, NJ, USA: IEEE, 2019, pp. 1–8.
[43]
J. Bachrach et al., “Chisel: Constructing hardware in a Scala embedded language,” in Proc. 49th Annu. Des. Automat. Conf., 2012, pp. 1216–1225.
[44]
P. Kurup and T. Abbasi, Logic Synthesis Using Synopsys®. Norwell, MA, USA: Kluwer Academic Puslishers, 2012.
[45]
V. J. Reddi et al., “MLPerf inference benchmark,” in Proc. ACM/IEEE 47th Annu. Int. Symp. Comput. Archit. (ISCA). Piscataway, NJ, USA: IEEE, 2020, pp. 446–459.

Index Terms

  1. Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural Networks
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image IEEE Transactions on Computers
            IEEE Transactions on Computers  Volume 73, Issue 1
            Jan. 2024
            300 pages

            Publisher

            IEEE Computer Society

            United States

            Publication History

            Published: 01 January 2024

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 20 Jan 2025

            Other Metrics

            Citations

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media