Abstract
Intense convolution computation and great memory requirement in CNNs constraint their wider deployments and applications. Although both the weights and feature maps in CNNs can be sparse, directly mapping sparse convolution to spGEMM in HPC domain fails to improve the actual performance. Besides, existing sparse formats like CSR are not suitable for encoding the sparse feature maps because convolution operates across rows.
In this work, we propose a new format and a novel sparse convolution algorithm to optimize sparse CNNs on GPUs. First, we design the Compressed Feature Map (CFM) format to store the sparse feature maps. Second, we propose an efficient sparse convolution algorithm called SPART with sparse weights and sparse feature maps. Finally, we optimize this algorithm on GPUs. Our experiments show that our SPART algorithm has good performance. Compared with dense convolution, the speedup of SPART is up to \(\mathbf {2.62 \times } \) (\(\mathbf {1.77 \times }\) in average) on V100 and up to \(\mathbf {1.84 \times }\) (\(\mathbf {1.24 \times }\) in average) on Titan X.
This work was supported by the National Natural Science Foundation China (No. 61672048).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. SIGARCH Comput. Archit. News 44(3), 1–13 (2016). https://doi.org/10.1145/3007787.3001138
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC 2009, p. 11. ACM, New York (2009). https://doi.org/10.1145/1654059.1654078. Article 18
Chen, X.: Escort: efficient sparse convolutional neural networks on GPUs. CoRR abs/1802.10280 (2018). arXiv:1802.10280
Chen, X.W., Lin, X.: Big data deep learning: challenges and perspectives. IEEE Access 2, 514–525 (2014). https://doi.org/10.1109/ACCESS.2014.2325029
Chen, Y., Emer, J., Sze, V.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: ISCA 2016, pp. 367–379 (2016). https://doi.org/10.1109/ISCA.2016.40
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011). http://dl.acm.org/citation.cfm?id=1953048.2078186
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 25 (2011). https://doi.org/10.1145/2049662.2049663. Article 1
Han, S., et al.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: ICLR (2015)
Han, S., et al.: EIE: efficient inference engine on compressed deep neural network. CoRR abs/1602.01528 (2016). arXiv:1602.01528
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. CoRR abs/1502.01852 (2015). arXiv:1502.01852
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Lavin, A.: Fast algorithms for convolutional neural networks. CoRR abs/1509.09308 (2015). arXiv:1509.09308
Li, X., Liang, Y., Yan, S., Jia, L., Li, Y.: A coordinated tiling and batching framework for efficient GEMM on GPUs. In: PPoPP 2019, pp. 229–241. ACM, New York (2019). https://doi.org/10.1145/3293883.3295734
Li, X., et al.: cuMBIR: an efficient framework for low-dose X-ray CT image reconstruction on GPUs. In: ICS 2018, pp. 184–194. ACM, New York (2018). https://doi.org/10.1145/3205289.3205309
Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. CoRR abs/1707.06342 (2017). arXiv:1707.06342
Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through FFTs. CoRR abs/1312.5851 (2013). arXiv:1312.5851
NVIDIA: cuBLAS Library (2018a). https://docs.nvidia.com/cublas
NVIDIA: cuDNN Library (2018b). https://developer.nvidia.com/cudnn
NVIDIA: cuSPARSE Library (2018c). https://docs.nvidia.com/cusparse
NVIDIA: GTX Titan X: a desktop GPU (2018d). https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x
NVIDIA: v100: a data-center GPU for AI (2018e). https://www.nvidia.com/en-us/data-center/tesla-v100/
Parashar, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks. CoRR abs/1708.04485 (2017). arXiv:1708.04485
Park, J., Li, S.R., Wen, W., Li, H., Chen, Y., Dubey, P.: Holistic SparseCNN: forging the trident of accuracy, speed, and size. CoRR abs/1608.01409 (2016). arXiv:1608.01409
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556
Sze, V., Chen, Y.-H., Yang, T.-J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. CoRR abs/1703.09039 (2017). arXiv:1703.09039
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. CoRR abs/1608.03665 (2016). arXiv:1608.03665
Xie, X., Liang, Y., Li, X., Tan, W.: CuLDA: solving large-scale LDA Problems on GPUs. In: HPDC 2019, pp. 195–205. ACM, New York (2019). https://doi.org/10.1145/3307681.3325407
Xie, X., et al.: Enabling coordinated register allocation and thread-level parallelism optimization for GPUs. In: MICRO 2015, pp. 395–406 (2015a). https://doi.org/10.1145/2830772.2830813
Xie, X., Liang, Y., Wang, Y., Sun, G., Wang, T.: Coordinated static and dynamic cache bypassing for GPUs. In: HPCA 2015, pp. 76–88 (2015b). https://doi.org/10.1109/HPCA.2015.7056023
Xie, X., Tan, W., Fong, L.L., Liang, Y.: CuMF\(\_\)SGD: parallelized stochastic gradient descent for matrix factorization on GPUs. In: HPDC 2017, pp. 79–92. ACM, New York (2017). https://doi.org/10.1145/3078597.3078602
Zhang, S., et al.: Cambricon-X: an accelerator for sparse neural networks. In: MICRO 2016, pp. 1–12 (2016). https://doi.org/10.1109/MICRO.2016.7783723
Acknowledgement
This work was supported by the National Natural Science Foundation China (No. 61672048).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Xie, J., Liang, Y. (2019). SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps. In: Yew, PC., Stenström, P., Wu, J., Gong, X., Li, T. (eds) Advanced Parallel Processing Technologies. APPT 2019. Lecture Notes in Computer Science(), vol 11719. Springer, Cham. https://doi.org/10.1007/978-3-030-29611-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-29611-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29610-0
Online ISBN: 978-3-030-29611-7
eBook Packages: Computer ScienceComputer Science (R0)