[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps

  • Conference paper
  • First Online:
Advanced Parallel Processing Technologies (APPT 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11719))

Included in the following conference series:

  • 1008 Accesses

Abstract

Intense convolution computation and great memory requirement in CNNs constraint their wider deployments and applications. Although both the weights and feature maps in CNNs can be sparse, directly mapping sparse convolution to spGEMM in HPC domain fails to improve the actual performance. Besides, existing sparse formats like CSR are not suitable for encoding the sparse feature maps because convolution operates across rows.

In this work, we propose a new format and a novel sparse convolution algorithm to optimize sparse CNNs on GPUs. First, we design the Compressed Feature Map (CFM) format to store the sparse feature maps. Second, we propose an efficient sparse convolution algorithm called SPART with sparse weights and sparse feature maps. Finally, we optimize this algorithm on GPUs. Our experiments show that our SPART algorithm has good performance. Compared with dense convolution, the speedup of SPART is up to \(\mathbf {2.62 \times } \) (\(\mathbf {1.77 \times }\) in average) on V100 and up to \(\mathbf {1.84 \times }\) (\(\mathbf {1.24 \times }\) in average) on Titan X.

This work was supported by the National Natural Science Foundation China (No. 61672048).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/BVLC/caffe/wiki/Model-Zoo.

References

  1. Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. SIGARCH Comput. Archit. News 44(3), 1–13 (2016). https://doi.org/10.1145/3007787.3001138

    Article  Google Scholar 

  2. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC 2009, p. 11. ACM, New York (2009). https://doi.org/10.1145/1654059.1654078. Article 18

  3. Chen, X.: Escort: efficient sparse convolutional neural networks on GPUs. CoRR abs/1802.10280 (2018). arXiv:1802.10280

  4. Chen, X.W., Lin, X.: Big data deep learning: challenges and perspectives. IEEE Access 2, 514–525 (2014). https://doi.org/10.1109/ACCESS.2014.2325029

    Article  Google Scholar 

  5. Chen, Y., Emer, J., Sze, V.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: ISCA 2016, pp. 367–379 (2016). https://doi.org/10.1109/ISCA.2016.40

  6. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011). http://dl.acm.org/citation.cfm?id=1953048.2078186

  7. Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 25 (2011). https://doi.org/10.1145/2049662.2049663. Article 1

    Article  MathSciNet  MATH  Google Scholar 

  8. Han, S., et al.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: ICLR (2015)

    Google Scholar 

  9. Han, S., et al.: EIE: efficient inference engine on compressed deep neural network. CoRR abs/1602.01528 (2016). arXiv:1602.01528

  10. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. CoRR abs/1502.01852 (2015). arXiv:1502.01852

  11. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM (2014)

    Google Scholar 

  12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  13. Lavin, A.: Fast algorithms for convolutional neural networks. CoRR abs/1509.09308 (2015). arXiv:1509.09308

  14. Li, X., Liang, Y., Yan, S., Jia, L., Li, Y.: A coordinated tiling and batching framework for efficient GEMM on GPUs. In: PPoPP 2019, pp. 229–241. ACM, New York (2019). https://doi.org/10.1145/3293883.3295734

  15. Li, X., et al.: cuMBIR: an efficient framework for low-dose X-ray CT image reconstruction on GPUs. In: ICS 2018, pp. 184–194. ACM, New York (2018). https://doi.org/10.1145/3205289.3205309

  16. Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. CoRR abs/1707.06342 (2017). arXiv:1707.06342

  17. Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through FFTs. CoRR abs/1312.5851 (2013). arXiv:1312.5851

  18. NVIDIA: cuBLAS Library (2018a). https://docs.nvidia.com/cublas

  19. NVIDIA: cuDNN Library (2018b). https://developer.nvidia.com/cudnn

  20. NVIDIA: cuSPARSE Library (2018c). https://docs.nvidia.com/cusparse

  21. NVIDIA: GTX Titan X: a desktop GPU (2018d). https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x

  22. NVIDIA: v100: a data-center GPU for AI (2018e). https://www.nvidia.com/en-us/data-center/tesla-v100/

  23. Parashar, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks. CoRR abs/1708.04485 (2017). arXiv:1708.04485

  24. Park, J., Li, S.R., Wen, W., Li, H., Chen, Y., Dubey, P.: Holistic SparseCNN: forging the trident of accuracy, speed, and size. CoRR abs/1608.01409 (2016). arXiv:1608.01409

  25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556

  26. Sze, V., Chen, Y.-H., Yang, T.-J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. CoRR abs/1703.09039 (2017). arXiv:1703.09039

  27. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  28. Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. CoRR abs/1608.03665 (2016). arXiv:1608.03665

  29. Xie, X., Liang, Y., Li, X., Tan, W.: CuLDA: solving large-scale LDA Problems on GPUs. In: HPDC 2019, pp. 195–205. ACM, New York (2019). https://doi.org/10.1145/3307681.3325407

  30. Xie, X., et al.: Enabling coordinated register allocation and thread-level parallelism optimization for GPUs. In: MICRO 2015, pp. 395–406 (2015a). https://doi.org/10.1145/2830772.2830813

  31. Xie, X., Liang, Y., Wang, Y., Sun, G., Wang, T.: Coordinated static and dynamic cache bypassing for GPUs. In: HPCA 2015, pp. 76–88 (2015b). https://doi.org/10.1109/HPCA.2015.7056023

  32. Xie, X., Tan, W., Fong, L.L., Liang, Y.: CuMF\(\_\)SGD: parallelized stochastic gradient descent for matrix factorization on GPUs. In: HPDC 2017, pp. 79–92. ACM, New York (2017). https://doi.org/10.1145/3078597.3078602

  33. Zhang, S., et al.: Cambricon-X: an accelerator for sparse neural networks. In: MICRO 2016, pp. 1–12 (2016). https://doi.org/10.1109/MICRO.2016.7783723

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation China (No. 61672048).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaming Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, J., Liang, Y. (2019). SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps. In: Yew, PC., Stenström, P., Wu, J., Gong, X., Li, T. (eds) Advanced Parallel Processing Technologies. APPT 2019. Lecture Notes in Computer Science(), vol 11719. Springer, Cham. https://doi.org/10.1007/978-3-030-29611-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29611-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29610-0

  • Online ISBN: 978-3-030-29611-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics