A Scalable 3D Array Architecture for Accelerating Convolutional Neural Networks

Yafei Ji^11,12,
Xiang Wang^11,12,
Yangfan Zhou ORCID: orcid.org/0000-0001-5311-1482^11,12,
Chen Cheng^11,12,
Jiang Li^11,12,
Haoyuan Wang^11,12,
Xuguang Wang^11,12 &
…
Xin Liu ORCID: orcid.org/0000-0003-4083-4731^11,12

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1515))

Included in the following conference series:

International Conference on Cognitive Systems and Signal Processing

1310 Accesses

Abstract

Convolutional neural network (CNN) is widely used in computer vision and image recognition, and the structure of the CNN becomes more and more complex. The complexity of CNN brings challenges of performance and storage capacity for hardware implementation. To address these challenges, in this paper, we propose a novel 3D array architecture for accelerating CNN. This proposed architecture has several benefits: Firstly, the strategy of multilevel caches is employed to improve data reusage, and thus reducing the access frequency to external memory; Secondly, performance and throughout are balanced among 3D array nodes by using novel workload and weight partitioning schemes. Thirdly, computing and transmission are performed simultaneously, resulting in higher parallelism and lower hardware storage requirement; Finally, the efficient data mapping strategy is proposed for better scalability of the entire system. The experimental results show that our proposed 3D array architecture can effectively improve the overall computing performance of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analysis of Optimum 3-Dimensional Array and Fast Data Movement for Efficient Memory Computation in Convolutional Neural Network Models

A streaming architecture for Convolutional Neural Networks based on layer operations chaining

Article 04 January 2020

A Convolutional Neural Networks Accelerator Based on Parallel Memory

References

Alex K., Ilya S., Geoffrey E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, vol. 25 (2012)
Google Scholar
Karen, S., Andrew, Z.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. 1409, 1–14 (2014)
Google Scholar
Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR 2016, pp. 770–778 (2016)
Google Scholar
Adrian, M., Caulfield, E.S., Chung, A.P.: A cloud scale acceleration architecture. In: 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), p. 1. IEEE Computer Society (2017)
Google Scholar
Jeremy, B., SungYe, K., Jeff, A.: clCaffe: OpenCL accelerated Caffe for convolutional neural networks. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE (2016)
Google Scholar
Jialiang, Z., Jing, L.: Improving the performance of OpenCL based FPGA accelerator for convolutional neural network. In: The ACM/SIGDA International Symposium (2017)
Google Scholar
Chen, Z., Zhenman, F., Peichen, P.: Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In: IEEE/ACM International Conference on Computer aided Design (2017)
Google Scholar
Liqiang, L., Yun, L., Qingcheng, X.: Evaluating fast algorithms for convolutional neural networks on FPGAs. In: IEEE 25th Annual International Symposium on Field Programmable Custom Computing Machines (FCCM) (2017)
Google Scholar
Lili, Z.: Research on the Acceleration of Tiny-yolo Convolution Neural Network Based on HLS. Chongqing University (2017)
Google Scholar
Yufei, M., Yu, C., Sarma, V., Jae, S.: Optimizing loop operation and data ow in FPGA acceleration of deep convolutional neural networks. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays FPGA 2017, pp 45–54 (2017)
Google Scholar
Chen, Z., Peng, L., Guangyu, S., Yijin, G., Bingjun, X., Jason, C.: Optimizing FPGA based accelerator design for deep convolutional neural networks. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays FPGA 2015, pp. 161–170 (2015)
Google Scholar
Marimuthu, S., Jawahar, N., Ponnambalam, S.: Threshold accepting and ant-colony optimization algorithms for scheduling m-machine flow shops with lot streaming. J. Mater. Process. Technol. 209(2), 1026–1041 (2009)
Article Google Scholar
YunChia, L., Mfatih, T., Quan, K.: A discrete particle swarm optimization algorithm for the no wait flowshop scheduling problem. Comput. Oper. Res. 35(9), 2807–2839 (2008)
Article MathSciNet Google Scholar
Nicholas, G., Chelliah, S.: A survey of machine scheduling problems with blocking and no wait in process. Oper. Res. 44(3), 510–525 (1996)
Article MathSciNet Google Scholar
Charles, E., Ekkehard, W.: GANGLION a fast field programmable gate array implementation of a connectionist classifier. IEEE J. Solid-State Circuits 27(3), 288–299 (1992)
Article Google Scholar
Jocelyn, C., Steven, P., Francois, R., Boyer, P.Y.: An FPGA based processor for image processing and neural networks. In: Microneuro, p. 330. IEEE (1996)
Google Scholar
Clement, F., Berin, M., Benoit, C.: NeuFlow: a runtime reconfigurable dataflow processor for vision. In: Computer Vision and Pattern Recognition Workshops (2011)
Google Scholar
Geng, T., Wang, T., Li, A., Jin, X., Herbordt, M.: FPDeep: scalable acceleration of CNN training on deeply-pipelined FPGA clusters. Trans. Comput. 14(8), 1143–1158 (2020)
MATH Google Scholar
Motamedi, M., Gysel, P., Akella, V., Ghiasi, S.: Design space exploration of FPGA based deep convolutional neural networks. In: Proceedings of the Asia and South Pacific Design Automation Conference ASPDAC, pp. 575–580 (2016)
Google Scholar
Jiang, L.I., Kubo, H., Yuichi, O., Satoru, Y.: A Multidimensional Configurable Processor Array Vocalise. Kyushu Institute of Technology (2014)
Google Scholar
Chen, Y.H., Krishna, T., Emer, J.S., Eyeriss, S.V.: An Energy efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid State Circuit 52, 127–138 (2016)
Article Google Scholar

Download references

Acknowledgement

This work was supported by the Hundred Talents Program of Chinese Academy of Sciences under grant No. Y9BEJ11001. This research was primarily conducted at Suzhou Institute of Nano-Tech and Nano-Bionics (SINANO).

Author information

Authors and Affiliations

Suzhou Institute of Nano-Tech and Nano-Bionics (SINANO), Chinese Academy of Sciences, 398 Ruoshui Road, Suzhou Industrial Park, Suzhou, Jiangsu, China
Yafei Ji, Xiang Wang, Yangfan Zhou, Chen Cheng, Jiang Li, Haoyuan Wang, Xuguang Wang & Xin Liu
Gusu Laboratory of Materials, 388 Ruoshui Road, Suzhou Industrial Park, Suzhou, Jiangsu, China
Yafei Ji, Xiang Wang, Yangfan Zhou, Chen Cheng, Jiang Li, Haoyuan Wang, Xuguang Wang & Xin Liu

Authors

Yafei Ji
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yangfan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Chen Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Haoyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuguang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Liu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fuchun Sun
National University of Defense Technology, Changsha, China
Dewen Hu
Universität Hamburg, Hamburg, Germany
Stefan Wermter
Tsingzhan Artificial Intelligence Research Institute, Nanjing, China
Lei Yang
Tsinghua University, Beijing, China
Huaping Liu
Tsinghua University, Beijing, China
Bin Fang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, Y. et al. (2022). A Scalable 3D Array Architecture for Accelerating Convolutional Neural Networks. In: Sun, F., Hu, D., Wermter, S., Yang, L., Liu, H., Fang, B. (eds) Cognitive Systems and Information Processing. ICCSIP 2021. Communications in Computer and Information Science, vol 1515. Springer, Singapore. https://doi.org/10.1007/978-981-16-9247-5_7

Download citation

DOI: https://doi.org/10.1007/978-981-16-9247-5_7
Published: 11 January 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9246-8
Online ISBN: 978-981-16-9247-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Scalable 3D Array Architecture for Accelerating Convolutional Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of Optimum 3-Dimensional Array and Fast Data Movement for Efficient Memory Computation in Convolutional Neural Network Models

A streaming architecture for Convolutional Neural Networks based on layer operations chaining

A Convolutional Neural Networks Accelerator Based on Parallel Memory

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Scalable 3D Array Architecture for Accelerating Convolutional Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of Optimum 3-Dimensional Array and Fast Data Movement for Efficient Memory Computation in Convolutional Neural Network Models

A streaming architecture for Convolutional Neural Networks based on layer operations chaining

A Convolutional Neural Networks Accelerator Based on Parallel Memory

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation