SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps

Jiaming Xie^13,14 &
Yun Liang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11719))

Included in the following conference series:

International Symposium on Advanced Parallel Processing Technologies

1017 Accesses
1 Citations

Abstract

Intense convolution computation and great memory requirement in CNNs constraint their wider deployments and applications. Although both the weights and feature maps in CNNs can be sparse, directly mapping sparse convolution to spGEMM in HPC domain fails to improve the actual performance. Besides, existing sparse formats like CSR are not suitable for encoding the sparse feature maps because convolution operates across rows.

In this work, we propose a new format and a novel sparse convolution algorithm to optimize sparse CNNs on GPUs. First, we design the Compressed Feature Map (CFM) format to store the sparse feature maps. Second, we propose an efficient sparse convolution algorithm called SPART with sparse weights and sparse feature maps. Finally, we optimize this algorithm on GPUs. Our experiments show that our SPART algorithm has good performance. Compared with dense convolution, the speedup of SPART is up to \(\mathbf {2.62 \times } \) (\(\mathbf {1.77 \times }\) in average) on V100 and up to \(\mathbf {1.84 \times }\) (\(\mathbf {1.24 \times }\) in average) on Titan X.

This work was supported by the National Natural Science Foundation China (No. 61672048).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HyGrid: A CPU-GPU Hybrid Convolution-Based Gridding Algorithm in Radio Astronomy

GPU Acceleration of Sparse Neural Networks

Accelerating Deep Convolutional Neural on GPGPU

Notes

1.
https://github.com/BVLC/caffe/wiki/Model-Zoo.

References

Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. SIGARCH Comput. Archit. News 44(3), 1–13 (2016). https://doi.org/10.1145/3007787.3001138
Article Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC 2009, p. 11. ACM, New York (2009). https://doi.org/10.1145/1654059.1654078. Article 18
Chen, X.: Escort: efficient sparse convolutional neural networks on GPUs. CoRR abs/1802.10280 (2018). arXiv:1802.10280
Chen, X.W., Lin, X.: Big data deep learning: challenges and perspectives. IEEE Access 2, 514–525 (2014). https://doi.org/10.1109/ACCESS.2014.2325029
Article Google Scholar
Chen, Y., Emer, J., Sze, V.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: ISCA 2016, pp. 367–379 (2016). https://doi.org/10.1109/ISCA.2016.40
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011). http://dl.acm.org/citation.cfm?id=1953048.2078186
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 25 (2011). https://doi.org/10.1145/2049662.2049663. Article 1
Article MathSciNet MATH Google Scholar
Han, S., et al.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: ICLR (2015)
Google Scholar
Han, S., et al.: EIE: efficient inference engine on compressed deep neural network. CoRR abs/1602.01528 (2016). arXiv:1602.01528
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. CoRR abs/1502.01852 (2015). arXiv:1502.01852
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Lavin, A.: Fast algorithms for convolutional neural networks. CoRR abs/1509.09308 (2015). arXiv:1509.09308
Li, X., Liang, Y., Yan, S., Jia, L., Li, Y.: A coordinated tiling and batching framework for efficient GEMM on GPUs. In: PPoPP 2019, pp. 229–241. ACM, New York (2019). https://doi.org/10.1145/3293883.3295734
Li, X., et al.: cuMBIR: an efficient framework for low-dose X-ray CT image reconstruction on GPUs. In: ICS 2018, pp. 184–194. ACM, New York (2018). https://doi.org/10.1145/3205289.3205309
Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. CoRR abs/1707.06342 (2017). arXiv:1707.06342
Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through FFTs. CoRR abs/1312.5851 (2013). arXiv:1312.5851
NVIDIA: cuBLAS Library (2018a). https://docs.nvidia.com/cublas
NVIDIA: cuDNN Library (2018b). https://developer.nvidia.com/cudnn
NVIDIA: cuSPARSE Library (2018c). https://docs.nvidia.com/cusparse
NVIDIA: GTX Titan X: a desktop GPU (2018d). https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x
NVIDIA: v100: a data-center GPU for AI (2018e). https://www.nvidia.com/en-us/data-center/tesla-v100/
Parashar, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks. CoRR abs/1708.04485 (2017). arXiv:1708.04485
Park, J., Li, S.R., Wen, W., Li, H., Chen, Y., Dubey, P.: Holistic SparseCNN: forging the trident of accuracy, speed, and size. CoRR abs/1608.01409 (2016). arXiv:1608.01409
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556
Sze, V., Chen, Y.-H., Yang, T.-J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. CoRR abs/1703.09039 (2017). arXiv:1703.09039
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. CoRR abs/1608.03665 (2016). arXiv:1608.03665
Xie, X., Liang, Y., Li, X., Tan, W.: CuLDA: solving large-scale LDA Problems on GPUs. In: HPDC 2019, pp. 195–205. ACM, New York (2019). https://doi.org/10.1145/3307681.3325407
Xie, X., et al.: Enabling coordinated register allocation and thread-level parallelism optimization for GPUs. In: MICRO 2015, pp. 395–406 (2015a). https://doi.org/10.1145/2830772.2830813
Xie, X., Liang, Y., Wang, Y., Sun, G., Wang, T.: Coordinated static and dynamic cache bypassing for GPUs. In: HPCA 2015, pp. 76–88 (2015b). https://doi.org/10.1109/HPCA.2015.7056023
Xie, X., Tan, W., Fong, L.L., Liang, Y.: CuMF\(\_\)SGD: parallelized stochastic gradient descent for matrix factorization on GPUs. In: HPDC 2017, pp. 79–92. ACM, New York (2017). https://doi.org/10.1145/3078597.3078602
Zhang, S., et al.: Cambricon-X: an accelerator for sparse neural networks. In: MICRO 2016, pp. 1–12 (2016). https://doi.org/10.1109/MICRO.2016.7783723

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation China (No. 61672048).

Author information

Authors and Affiliations

Peking University, Beijing, China
Jiaming Xie & Yun Liang
Peng Cheng Laboratory, Shenzhen, China
Jiaming Xie & Yun Liang

Authors

Jiaming Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yun Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiaming Xie .

Editor information

Editors and Affiliations

University of Minnesota, Minneapolis, MN, USA
Pen-Chung Yew
Chalmers University of Technology, Gothenburg, Sweden
Per Stenström
National University of Defense Technology, Changsha, China
Junjie Wu
Nankai University, Tianjin, China
Xiaoli Gong
Nankai University, Tianjin, China
Tao Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, J., Liang, Y. (2019). SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps. In: Yew, PC., Stenström, P., Wu, J., Gong, X., Li, T. (eds) Advanced Parallel Processing Technologies. APPT 2019. Lecture Notes in Computer Science(), vol 11719. Springer, Cham. https://doi.org/10.1007/978-3-030-29611-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-29611-7_6
Published: 09 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29610-0
Online ISBN: 978-3-030-29611-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)