Abstract
This paper describes a hardware-oriented two-stage algorithm that can be deployed in a resource-limited field-programmable gate array (FPGA) for fast-object detection and recognition with out external memory. The first stage is the bounding boxes proposal with a conventional object detection method, and the second is convolutional neural network (CNN)-based classification for accuracy improvement. Frequently accessing external memories significantly affects the execution efficiency of object classification. Unfortunately, the existing CNN models with a large number of parameters are difficult to deploy in FPGAs with limited on-chip memory resources. In this study, we designed a compact CNN model and performed the hardware-oriented quantization for parameters and intermediate results. As a result, CNN-based ultra-fast-object classification was realized with all parameters and intermediate results stored on chip. Several evaluations were performed to demonstrate the performance of the proposed algorithm. The object classification module consumes only 163.67 Kbits of on-chip memories for ten regions of interest (ROIs), this is suitable for low-end FPGA devices. In the aspect of accuracy, our method provides a correctness rate of 98.01% in open-source data set MNIST and over 96.5% in other three self-built data sets, which is distinctly better than conventional ultra-high-speed object detection algorithms.
Similar content being viewed by others
References
Sharma, A., Shimasaki, K., Gu, Q., Chen, J., Aoyama, T., Takaki, T., Ishii, I., Tamura, K., Tajima, K.: Super high-speed vision platform for processing 1024\(\times\) 1024 images in real time at 12500 fps. IEEE/SICE International Symposium on system integration (SII), 544–549 (2016)
Ishii, I., Tatebe, T., Gu, Q., Takaki, T.: Color-histogram-based Tracking at 2000 fps. J. Electron. Imaging 21(1), 13010 (2012)
Ishii, I., Ichida, T., Gu, Q., Takaki, T.: 500-fps face tracking system. J. Real-Time Image Process. 8(4), 379–388 (2013)
Ma, X., Najjar, W.A., Roy-Chowdhury, A.K.: Evaluation and acceleration of high-throughput fixed-point object detection on FPGAs. IEEE Trans. Circuits Syst. Video Technol. 25(6), 1051–1062 (2015)
Nakahara, H., Yonekawa, H., Fujii, T., Sato, S.: A lightweight YOLOv2: A binarized CNN with a parallel support vector regression for an FPGA. Proceedings of the 2018 ACM/SIGDA International Symposium on field-programmable gate arrays, 31–40 (2018)
Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)
Shen, J., Huang, Y., Wang, Z., Qiao, Y., Wen, M., Zhang, C.: Proceedings of the 2018 ACM/SIGDA International Symposium on field-programmable gate arrays, 97–106 (2018)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Bay, H., Tuytelaars, T., Van Gool L.: Surf: Speeded up robust features. Eur. Conf. Comput. Vis. 404–417 (2006)
Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection, Computer Vision and Pattern Recognition, 2005, IEEE Computer Society Conference on, 886–893 (2005)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. Eur. Conf. Comput. Vis. 778–792 (2010)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. IEEE international conference on Computer Vision (ICCV), 2564–2571 (2011)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition, 580–587 (2014)
Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Girshick, R.: Fast r-cnn. Proceedings of the IEEE international conference on computer vision 1440–1448, (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 91–99 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv preprint (2017)
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. European conference on computer vision, 21–37 (2016)
Gu, Q., Ishii, I.: Review of some advances and applications in real-time high-speed vision: Our views and experiences. Int. J. Autom. Comput. 13(4), 305–318 (2016)
Ishii, I., Tatebe, T., Gu, Q., Takaki, T.: Color-histogram-based tracking at 2000 fps. J. Electron. Imaging 21(1), 013010 (2012)
Gu, Q., Takaki, T., Ishii, I.: Fast FPGA-based multiobject feature extraction. IEEE Trans. Circuits Syst. Video Technol. 23(1), 30–45 (2013)
Gu, Q., Kawahara, T., Aoyama, T., Takaki, T., Ishii, I., Takemoto, A., Sakamoto, N.: LOC-based high-throughput cell morphology analysis system. IEEE Trans. Autom. Sci. Eng. 12(4), 1346–1356 (2015)
Gu, Q., Aoyama, T., Takaki, T., Ishii, I.: Simultaneous vision-based shape and motion analysis of cells fast-flowing in a microchannel. IEEE Trans. Autom. Sci. Eng. 12(1), 204–215 (2015)
Li, J., Yin, Y., Liu, X., Xu, D., Gu, Q.: 12,000-fps Multi-object detection using HOG descriptor and SVM classifier. IEEE/RSJ International Conference on intelligent robots and systems (IROS), 5928–5933 (2017)
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst., pp. 1135–1143 (2015)
Luo, J., Wu, J., Lin, W.: Thinet: A filter level pruning method for deep neural network compression. arXiv preprint arXiv:1707.06342 (2017)
Wang, W., Sun, Y., Eriksson, B., Wang, W., Aggarwal, V.: Wide compression: Tensor ring nets. IEEE Conference on computer vision and pattern recognition, 9329–9338 (2018)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. IEEE Conference on computer vision and pattern recognition (CVPR), 5987–5995 (2017)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. European Conference on computer vision, 525–542 (2016)
Wang, P., Hu, Q., Zhang, Y., Zhang, C., Liu, Y., Cheng, J.: Others, two-step quantization for low-bit neural networks. Proceedings of the IEEE Conference on computer vision and pattern recognition, 4376–4384 (2018)
Park, E., Ahn, J., Yoo, S.: Weighted-entropy-based quantization for deep neural networks. IEEE Conference on computer vision and pattern recognition (CVPR) (2017)
Leng, C., Li, H., Zhu, S., Jin, R.: Extremely low bit neural network: Squeeze the last bit out with admm. arXiv preprint arXiv:1707.09870 (2017)
Acknowledgements
This work was partly supported by the National Natural Science Foundation of China (Grant No. 61673376).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, J., Long, X., Hu, S. et al. A novel hardware-oriented ultra-high-speed object detection algorithm based on convolutional neural network. J Real-Time Image Proc 17, 1703–1714 (2020). https://doi.org/10.1007/s11554-019-00931-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-019-00931-5