Abstract
Convolutional Neural Networks (CNN) have achieved great success in many computer vision tasks. However, it is difficult to deploy CNN models on low-cost devices with limited power budgets, because most existing CNN models are computationally expensive. Therefore, CNN model compression and acceleration have become a hot research topic in the deep learning area. Typical schemes for speeding up the feed-forward process with a slight accuracy loss include parameter pruning and sharing, low-rank factorization, compact convolutional filters and knowledge distillation. In this study, we propose a general acceleration scheme that replaces the floating-point multiplication with integer addition. To this end, we propose a general accelerate scheme, where the floating point multiplication is replaced by integer addition. The motivation is based on the fact that every floating point can be replaced by the summation of an exponential series. Therefore, the multiplication between two floating points can be converted to the addition among exponentials. In the experiment section, we directly apply the proposed scheme to AlexNet, VGG, ResNet for image classification, and Faster-RCNN for object detection. The results acquired from ImageNet and PASCAL VOC show that the proposed quantized scheme has a promising performance, even with only one item of exponential. Moreover, we analyzed the eciency of our method on mainstream FPGAs. The experimental results show that the proposed quantized scheme can achieve acceleration on FPGA with a slight accuracy loss.
Similar content being viewed by others
References
Cheng Y, Wang D, Zhou P, et al. A survey of model compression and acceleration for deep neural networks. arXiv: 1710.09282
Liang S, Yin S Y, Liu L B, et al. FP-BNN: Binarized neural network on FPGA. Neurocomputing, 2018, 275: 1072–1086
Li S, Dou Y, Niu X, et al. A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection. Neurocomputing, 2017, 230: 48–59
Rigamonti R, Sironi A, Lepetit V, et al. Learning separable filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, 2013. 2754–2761
Cohen T S, Welling M. Group equivariant convolutional networks. In: Proceedings of the 33rd International Conference on Machine Learning. New York, 2016. 48: 2990–2999
Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. In: Proceedings of British Machine Vision Conference. Nottingham, 2014
Tai C, Xiao T, Wang X, et al. Convolutional neural networks with low-rank regularization. In: Proceedings of the 4th International Conference on Learning Representations. San Juan, 2016
Manngård M, Kronqvist J, Böling J M. Structural learning in artificial neural networks using sparse optimization. Neurocomputing, 2018, 272: 660–667
Courbariaux M, Bengio Y. Binarynet: Training deep neural networks with weights and activations constrained to +1 or −1. arXiv: 1602.02830
Courbariaux M, Bengio Y, David J. Binaryconnect: Training deep neural networks with binary weights during propagations. arXiv: 1511.00363
Denton E L, Zaremba W, Bruna J, et al. Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of the Annual Conference on Neural Information Processing Systems. Montreal, 2014. 1269–1277
Li H, Ouyang W, Wang X. Multi-bias non-linear activation in deep neural networks. In: Proceedings of the 33rd International Conference on Machine Learning. New York, 2016, 48: 221–229
Zhai S, Cheng Y, Zhang Z M. Doubly convolutional neural networks. In: Proceedings of Annual Conference on Neural Information Processing Systems. Barcelona, 2016. 1082–1090
Szegedy C, Ioffe S, Vanhoucke V. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv: 1602.07261
Wen W, Wu C, Wang Y, et al. Learning structured sparsity in deep neural networks. arXiv: 1608.03665
Srinivas S, Babu R V, Data-free parameter pruning for deep neural networks. arXiv: 1507.06149
Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural networks. arXiv: 1506.02626
Chen W, Wilson J, Tyree S, et al. Compressing neural networks with the hashing trick. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, 2015, 37: 2285–2294
Ullrich K, Meeds E, Welling M. Soft weight-sharing for neural network compression. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, 2017
Lebedev V, Lempitsky V S. Fast convnets using group-wise brain damage. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016. 2554–2564
Zhou H, Alvarez J M, Porikli F. Less is more: Towards compact CNNs. In: Proceedings of the 14th European Conference. Amsterdam, 2016, 9908: 662–677
Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, 2017
Gong Y, Liu L, Yang M, et al. Compressing deep convolutional networks using vector quantization. arXiv: 1412.6115
Wu J X, Leng C, Cheng J. Quantized convolutional neural networks for mobile devices. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016. 4820–4828
Vanhoucke V, Senior A, Mao M Z, Improving the speed of neural networks on CPUs. In: Proceedings of the 25th Annual Conference on Neural Information Processing Systems. Sierra Nevada, 2011
Gupta S, Agrawal A, Gopalakrishnan K, et al. Deep learning with limited numerical precision. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, 2015, 37: 1737–1746
Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning. arXiv: 1510.00149
Rastegari M, Ordonez V, Redmon J, et al. Xnor-net: Imagenet classification using binary convolutional neural networks. In: Proceedings of the Computer Vision-ECCV 2016-14th European Conference. Amsterdam, 2016, 9908: 525–542
Merolla P, Appuswamy R, Arthur J V, et al. Deep neural networks are robust to weight binarization and other nonlinear distortions. arXiv: 1606.01981
Hu H X, Wen G, Yu X, et al. Distributed stabilization of heterogeneous MASs in uncertain strong-weak competition networks. IEEE Trans Syst Man Cybern Syst, 2020, doi: https://doi.org/10.1109/TSMC.2020.3034765
Hu H X, Wen G, Yu W, et al. Distributed stabilization of multiple heterogeneous agents in the strong-weak competition network: A switched system approach. IEEE Trans Cybern, 2020, doi: https://doi.org/10.1109/TCYB.2020.2995154
Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems 2012. Lake Tahoe, 2012. 1106–1114
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. San Diego, 2015
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016. 770–778
Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the Annual Conference on Neural Information Processing Systems. Montreal, 2015. 91–99
Hubara I, Courbariaux M, Soudry D, et al. Binarized neural networks. In: Proceedings of the Annual Conference on Neural Information Processing Systems. Barcelona, 2016. 4114–4122
Chen H, Wang Y, Xu C, et al. AdderNet: Do we really need multiplications in deep learning? In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 2020. 1465–1474
Jin Q, Yang L, Liao Z. AdaBits: Neural network quantization with adaptive bit-widths. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 2020. 2143–2153
Sun Y, Wang Z, Huang S, et al. Accelerating frequent item counting with FPGA. In: Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, 2014. 109–112
Author information
Authors and Affiliations
Corresponding authors
Additional information
This work was supported by the National Natural Science Foundation of China (Grant Nos. 41971424, 61701191), the Key Technical Project of Xiamen Ocean Bureau (Grant No. 18CZB033HJ11), the Natural Science Foundation of Fujian Province (Grant Nos. 2019J01712, 2020J01701), the Key Technical Project of Xiamen Science and Technology Bureau (Grant Nos. 3502Z20191018, 3502Z20201007, 3502Z20191022, 3502Z20203057), and the Science and Technology Project of Education Department of Fujian Province (Grant Nos. JAT190321, JAT190318, JAT190315).
Rights and permissions
About this article
Cite this article
Cai, G., Yang, S., Du, J. et al. Convolution without multiplication: A general speed up strategy for CNNs. Sci. China Technol. Sci. 64, 2627–2639 (2021). https://doi.org/10.1007/s11431-021-1936-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11431-021-1936-2