Abstract
Today, the high accuracy of deep learning has led to use in various domains such as image and voice classification. However, vast computations of deep neural networks (DNNs) have caused the inefficiency of traditional processors, resulting in the emergence of hardware accelerators. DNN accelerators have increased performance by exploiting opportunities such as data reuse and sparsity. In these accelerators, the dataflow is an essential factor, so that some of them have a reconfigurable architecture to support different mappings and dataflows. However, the accelerators explicitly designed for exploiting the data sparsity are usually non-reconfigurable and have fixed dataflow. This paper presents a new dataflow called Channel Dimension Stationary (CDS) for the MAERI (a Reconfigurable Neural Network Accelerator). It can be used for convolutional layers with sparse input feature maps (ifmaps). In the proposed dataflow, computations are based on the Cartesian product method. However, multiplications leading to useless results are avoided. To analyze the mapping based on CDS dataflow, we upgraded the mRNA tool (mapper for Reconfigurable Neural Accelerators), which includes an energy and performance analyzer of the mapping strategy in MAERI. By evaluation, we found that in the sparse ifmaps of 50%, 70%, and 90%, the proposed mapping on average can increase energy efficiency by 3x, 6x, and 13x respectively, without noticeable reduction of utilization.
Similar content being viewed by others
Data availability
The original data and analyzer used and analyzed during the current study are available in the Github repository, https://github.com/georgia-tech-synergy-lab/mRNA. In addition, part of the data generated during the present study is included in this manuscript, and generated codes are available from the corresponding author on request.
Notes
Input data (e.g., images) is transformed into a feature map by passing through CNN layers.
ScratchPad Memory (Prefetch Buffer).
References
Parashar, A., et al.: Scnn: an accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Arch. News 45(2), 27–40 (2017)
Li, R., et al.: Retrieving real world clothing images via multi-weight deep convolutional neural networks. Clust. Comput. 22(3), 7123–7134 (2019)
Aimar, A., et al.: Nullhop: a flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Netw. Learn. Syst. 30(3), 644–656 (2018)
Xia, K.-J., Yin, H.-S., Wang, J.-Q.: A novel improved deep convolutional neural network model for medical image fusion. Clust. Comput. 22(1), 1515–1527 (2019)
Kwon, H., Samajdar, A., Krishna, T.: Maeri: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Not. 53(2), 461–475 (2018)
Qasem, M.H., et al.: Matrix multiplication of big data using mapreduce: a review. In: 2017 2nd International Conference on the Applications of Information Technology in Developing Renewable Energy Processes & Systems (IT-DREPS). IEEE (2017)
Zhu, X., et al.: Weighted pooling for image recognition of deep convolutional neural networks. Clust. Comput. 22(4), 9371–9383 (2019)
Chen, Y.-H., et al.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2016)
Kwon, H., Samajdar, A., Krishna, T.: Rethinking nocs for spatial neural network accelerators. In: 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE (2017)
Li, J., et al.: SmartShuttle: optimizing off-chip memory accesses for deep learning accelerators. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE (2018)
Zhang, S., et al.: Cambricon-x: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE (2016)
Gao, M., et al.: Tetris: Scalable and efficient neural network acceleration with 3d memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (2017)
Firuzan, A., et al.: Reconfigurable network-on-chip for 3D neural network accelerators. In: 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE (2018)
Chen, K.-C., et al.: NoC-based DNN accelerator: a future design paradigm. in Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip (2019)
Ascia, G., et al.: Networks-on-chip based deep neural networks accelerators for iot edge devices. In: 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS). IEEE (2019)
Mirmahaleh, S.Y.H., Reshadi, M., Bagherzadeh, N.: Flow mapping on mesh-based deep learning accelerator. J. Parallel Distrib. Comput. 144, 80–97 (2020)
Mirmahaleh, S.Y.H., et al.: Flow mapping and data distribution on mesh-based deep learning accelerator. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Qin, E., et al.: Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Huang, G., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Mahafzah, B.A., Tahboub, R.Y., Tahboub, O.Y.: Performance evaluation of broadcast and global combine operations in all-port wormhole-routed OTIS-Mesh interconnection networks. Clust. Comput. 13(1), 87–110 (2010)
Mahafzah, B.A., et al.: The optical chained-cubic tree interconnection network: topological structure and properties. Comput. Electr. Eng. 38(2), 330–345 (2012)
Abdullah, M., Abuelrub, E., Mahafzah, B.: The chained-cubic tree interconnection network. Int. Arab J. Inf. Technol. 8(3), 334–343 (2011)
Mahafzah, B.A., Al-Zoubi, I.O.: Broadcast communication operations for hyper hexa-cell interconnection network. Telecommun. Syst. 67(1), 73–93 (2018)
Samajdar, A., et al.: Scale-sim: systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883 (2018)
Zhao, Z., et al.: mRNA: enabling efficient mapping space exploration for a reconfiguration neural accelerator. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE (2019)
Iandola, F.N., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(< 0.5\) MB model size. arXiv preprint arXiv:1602.07360 (2016)
Funding
This work has received no funding.
Author information
Authors and Affiliations
Contributions
Midia Reshadi introduced the subject and tools. Then, Babak NarimanJahan proposed the idea of the work and, by the approval of Midia Reshadi and Ahmad KhademZadeh, started analyzing, experimenting, and modifying mRNA based on the proposed idea. With the guidance of Midia Reshadi and the advice of Akram Reza, and the supervision of Ahmad KhademZadeh, Babak NarimanJahan analyzed the results of the experiments and wrote the manuscript. All authors read and approved the final version.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Narimanjahan, B., Reshadi, M., Khademzadeh, A. et al. MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map. Cluster Comput 25, 3213–3230 (2022). https://doi.org/10.1007/s10586-021-03527-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03527-6