Abstract
Convolutional neural network (CNN) is widely used in computer vision and image recognition, and the structure of the CNN becomes more and more complex. The complexity of CNN brings challenges of performance and storage capacity for hardware implementation. To address these challenges, in this paper, we propose a novel 3D array architecture for accelerating CNN. This proposed architecture has several benefits: Firstly, the strategy of multilevel caches is employed to improve data reusage, and thus reducing the access frequency to external memory; Secondly, performance and throughout are balanced among 3D array nodes by using novel workload and weight partitioning schemes. Thirdly, computing and transmission are performed simultaneously, resulting in higher parallelism and lower hardware storage requirement; Finally, the efficient data mapping strategy is proposed for better scalability of the entire system. The experimental results show that our proposed 3D array architecture can effectively improve the overall computing performance of the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alex K., Ilya S., Geoffrey E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, vol. 25 (2012)
Karen, S., Andrew, Z.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. 1409, 1–14 (2014)
Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR 2016, pp. 770–778 (2016)
Adrian, M., Caulfield, E.S., Chung, A.P.: A cloud scale acceleration architecture. In: 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), p. 1. IEEE Computer Society (2017)
Jeremy, B., SungYe, K., Jeff, A.: clCaffe: OpenCL accelerated Caffe for convolutional neural networks. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE (2016)
Jialiang, Z., Jing, L.: Improving the performance of OpenCL based FPGA accelerator for convolutional neural network. In: The ACM/SIGDA International Symposium (2017)
Chen, Z., Zhenman, F., Peichen, P.: Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In: IEEE/ACM International Conference on Computer aided Design (2017)
Liqiang, L., Yun, L., Qingcheng, X.: Evaluating fast algorithms for convolutional neural networks on FPGAs. In: IEEE 25th Annual International Symposium on Field Programmable Custom Computing Machines (FCCM) (2017)
Lili, Z.: Research on the Acceleration of Tiny-yolo Convolution Neural Network Based on HLS. Chongqing University (2017)
Yufei, M., Yu, C., Sarma, V., Jae, S.: Optimizing loop operation and data ow in FPGA acceleration of deep convolutional neural networks. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays FPGA 2017, pp 45–54 (2017)
Chen, Z., Peng, L., Guangyu, S., Yijin, G., Bingjun, X., Jason, C.: Optimizing FPGA based accelerator design for deep convolutional neural networks. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays FPGA 2015, pp. 161–170 (2015)
Marimuthu, S., Jawahar, N., Ponnambalam, S.: Threshold accepting and ant-colony optimization algorithms for scheduling m-machine flow shops with lot streaming. J. Mater. Process. Technol. 209(2), 1026–1041 (2009)
YunChia, L., Mfatih, T., Quan, K.: A discrete particle swarm optimization algorithm for the no wait flowshop scheduling problem. Comput. Oper. Res. 35(9), 2807–2839 (2008)
Nicholas, G., Chelliah, S.: A survey of machine scheduling problems with blocking and no wait in process. Oper. Res. 44(3), 510–525 (1996)
Charles, E., Ekkehard, W.: GANGLION a fast field programmable gate array implementation of a connectionist classifier. IEEE J. Solid-State Circuits 27(3), 288–299 (1992)
Jocelyn, C., Steven, P., Francois, R., Boyer, P.Y.: An FPGA based processor for image processing and neural networks. In: Microneuro, p. 330. IEEE (1996)
Clement, F., Berin, M., Benoit, C.: NeuFlow: a runtime reconfigurable dataflow processor for vision. In: Computer Vision and Pattern Recognition Workshops (2011)
Geng, T., Wang, T., Li, A., Jin, X., Herbordt, M.: FPDeep: scalable acceleration of CNN training on deeply-pipelined FPGA clusters. Trans. Comput. 14(8), 1143–1158 (2020)
Motamedi, M., Gysel, P., Akella, V., Ghiasi, S.: Design space exploration of FPGA based deep convolutional neural networks. In: Proceedings of the Asia and South Pacific Design Automation Conference ASPDAC, pp. 575–580 (2016)
Jiang, L.I., Kubo, H., Yuichi, O., Satoru, Y.: A Multidimensional Configurable Processor Array Vocalise. Kyushu Institute of Technology (2014)
Chen, Y.H., Krishna, T., Emer, J.S., Eyeriss, S.V.: An Energy efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid State Circuit 52, 127–138 (2016)
Acknowledgement
This work was supported by the Hundred Talents Program of Chinese Academy of Sciences under grant No. Y9BEJ11001. This research was primarily conducted at Suzhou Institute of Nano-Tech and Nano-Bionics (SINANO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ji, Y. et al. (2022). A Scalable 3D Array Architecture for Accelerating Convolutional Neural Networks. In: Sun, F., Hu, D., Wermter, S., Yang, L., Liu, H., Fang, B. (eds) Cognitive Systems and Information Processing. ICCSIP 2021. Communications in Computer and Information Science, vol 1515. Springer, Singapore. https://doi.org/10.1007/978-981-16-9247-5_7
Download citation
DOI: https://doi.org/10.1007/978-981-16-9247-5_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9246-8
Online ISBN: 978-981-16-9247-5
eBook Packages: Computer ScienceComputer Science (R0)