Abstract
Deep learning has made a real revolution in the embedded computing environment. Convolutional neural network (CNN) revealed itself as a reliable fit to many emerging problems. The next step, is to enhance the CNN role in the embedded devices including both implementation details and performance. Resources needs of storage and computational ability are limited and constrained, resulting in key issues we have to consider in embedded devices. Compressing (i.e., quantizing) the CNN network is a valuable solution. In this paper, Our main goals are: memory compression and complexity reduction (both operations and cycles reduction) of CNNs, using methods (including quantization and pruning) that don’t require retraining (i.e., allowing us to exploit them in mobile system, or robots). Also, exploring further quantization techniques for further complexity reduction. To achieve these goals, we compress a CNN model layers (i.e., parameters and outputs) into suitable precision formats using several quantization methodologies. The methodologies are: First, we describe a pruning approach, which allows us to reduce the required storage and computation cycles in embedded devices. Such enhancement can drastically reduce the consumed power and the required resources. Second, a hybrid quantization approach with automatic tuning for the network compression. Third, a K-means quantization approach. With a minor degradation relative to the floating-point performance, the presented pruning and quantization methods are able to produce a stable performance fixed-point reduced networks. A precise fixed-point calculations for coefficients, input/output signals and accumulators are considered in the quantization process.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ciresan D, Meier U, Masci J, Gambardella L, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence, pp 1237–1242
Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3642–3649
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Dipert B, Bier J, Rowen C, Dashwood J, Laroche D, Ors A, Thompson M (2016) Deep learning for object recognition: DSP and specialized processor optimizations. http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/documents/pages/cnn-dsps. Aaccessed 30 Dec 2018
NVIDIA, Nvidia pascal architecture. http://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf (2016). Accessed 30 Dec 2018
Delaye E, Sirasao A, Dudha C, Das S (2017) Deep learning challenges and solutions with xilinx fpgas. In: IEEE/ACM international conference on computer-aided design (ICCAD), pp 908–913
Al-Hami M, Lakaemper R (2014) Sitting pose generation using genetic algorithm for nao humanoid robots. In: IEEE international workshop on advanced robotics and its social impacts, pp 137–142
Al-Hami M, Lakaemper R (2015) Towards human pose semantic synthesis in 3D based on query keywords. VISAPP (3)
Al-Hami M, Lakaemper R (2017) Reconstructing 3D human poses from keyword based image database query. In: International conference on 3D vision (3DV), pp 440–448
Al-Hami M, Lakaemper R, Rawashdeh M, Hossain MS (2019) Camera localization for a human-pose in 3D space using a single 2D human-pose image with landmarks: a multimedia social network emerging demand. Multimed Tools Appl 78(3):3587–3608
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2017) Quantized neural networks: training neural networks with low precision weights and activations. J Mach Learn Res 18:6869–6898
Lin D, Talathi S, Annapureddy V (2016) Fixed point quantization of deep convolutional networks. In: International conference on machine learning (ICML), pp 2849–2858
Courbariaux M, David J-P, Bengio Y (2014) Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024
Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: training neural networks with weights and activations constrained to \(+1\) or \(-1\). arXiv preprint arXiv:1602.02830
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: Proceedings of the 32nd international conference on machine learning, pp 1737–1746
Esser S, Appuswamy R, Merolla P, Arthur J, Modha D (2015) Backpropagation for energy-efficient neuromorphic computing. In: Advances in neural information processing systems, vol 435, pp 1117–1125
Anwar S, Hwang K, Sung W (2015) Fixed point optimization of deep convolutional neural networks for object recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1131–1135
Gysel P, Motamedi M, Ghiasi S (2016) Hardware-oriented approximation of convolutional neural networks. ArXiv e-prints, arXiv:1604.03168
Vanhoucke V, Senior A, Mao M (2011) Improving the speed of neural networks on cpus. In: Proceedings of the deep learning and unsupervised feature learning NIPS workshop
Hwang K, Sung W (2014) Fixed-point feedforward deep neural network design using weights +1, 0, and \(-1\). In: IEEE workshop on signal processing systems (SiPS)
Courbariaux M, Bengio Y, David J (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131
Soudry D, Hubara I, Meir R (2014) Expectation backpropagation: parameter-free training of multilayer neural networks with continuous or discrete weights. In: Advances in neural information processing systems (NIPS), pp 963–971
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. ArXiv e-prints, arXiv:1603.05279
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz M A, Dally W J (2016) Eie: Efficient inference engine on compressed deep neural network. arXiv preprint arXiv:1602.01528
Han S, Mao H, Dally W J (2016) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1602.01528
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Iandola F N, Moskewicz M W, Ashraf K, Han S, Dally W J, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5mb model 465 size. arXiv preprint arXiv:1602.07360
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Wielgosz M, Pietron M (2017) Using spatial pooler of hierarchical temporal memory to classify noisy videos with predefined complexity. J Neurocomput 240:84–97
Wielgosz M, Pietron M, Wiatr K (2016) Opencl-accelerated object classification in video streams using spatial pooler of hierarchical temporal memory. arXiv preprint arXiv:1608.01966
Pietron M, Wielgosz M, Wiatr K (2016) Formal analysis of htm spatial pooler performance under predefined operation condition. In: International joint conference on rough sets, pp 396–405
Pietron M, Wielgosz M, Wiatr K (2016) Parallel implementation of spatial pooler in hierarchical temporal memory. In: International conference on agents and artificial intelligence (ICAART), pp 346–353
Ristretto quantization system, http://lepsucd.com/?page_id=621 (2016). Accessed 31 Dec 2018
Google, Tensorflow, https://www.tensorflow.org (2016). Accessed 22 Feb 2017
Al-Hami M, Pietron M, Casas R, Hijazi S, Kaul P (2018) Towards a stable quantized convolutional neural networks: an embedded perspective. In: 10th International conference on agents and artificial intelligence (ICAART), pp 573–580
Courbariaux M, Bengio Y, Jean-Pierre D (2014) Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems (NIPS), pp 1135–1143
Huang Q, Zhou K, You S, Neumann U (2018) Learning to prune filters in convolutional neural networks, arXiv:1801.07365
Li H, Kadav A, Durdanovic I, Samet H, Graf H P (2016) Pruning filters for efficient convnets arXiv preprint arXiv:1608.08710,
Gysel P (2016) Ristretto: Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1605.06402
Al-Hami M, Pietron M, Kumar R, Casas R, Hijazi S, Rowen C (2018) Method for hybrid precision convolutional neural network representation. arXiv preprint arXiv:1807.09760
Gysel P, Motamedi M, Ghiasi S (2016) Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1604.03168
Han S, Mao H, Dally W J (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: IEEE conference on computer vision and pattern recognition (CVPR)
Zhang L, Liu B (2016) Ternary weight networks. arXiv:1605.04711
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Al-Hami, M., Pietron, M., Casas, R. et al. Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification. Neural Process Lett 51, 105–127 (2020). https://doi.org/10.1007/s11063-019-10076-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10076-y