[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3240765.3240775guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs

Published: 05 November 2018 Publication History

Abstract

The rapid improvement in computation capability has made convolutional neural networks (CNNs) a great success in recent years on image classification tasks, which has also prospered the development of objection detection algorithms with significantly improved accuracy. However, during the deployment phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the inference algorithm. Therefore, this work proposes to customize the detection algorithm, e.g. SSD, to benefit its hardware implementation with low data precision at the cost of marginal accuracy degradation. The proposed FPGA-based deep learning inference accelerator is demonstrated on two Intel FPGAs for SSD algorithm achieving up to 2.18 TOPS throughput and up to 3.3× superior energy-efficiency compared to GPU.

References

[1]
Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew C. Ling, and Gordon R. Chiu. 2017. An OpenCL™Deep Learning Accelerator on Arria 10. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).
[2]
M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman. [n. d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.
[3]
Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In IEEE Int. Sym. on Field-Programmable Custom Computing Machines (FCCM). 152–159.
[4]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Jun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv: (2014).
[6]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS).
[7]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. Oct. 2016. SSD: Single Shot MultiBox Detector. In European Conference Computer Vision (ECCV).
[8]
Yufei Ma, Yu Cao, Sarma B. K. Vrudhula, and Jae-sun Seo. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In Int. Conf on Field Programmable Logic and Applications (FPL).
[9]
Yufei Ma, Yu Cao, Sarma B. K. Vrudhula, and Jae-sun Seo. 2017. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).
[10]
Bert Moons and Marian Verhelst. 2017. An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS. J. Solid-State Circuits (2017).
[11]
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).
[12]
Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In IEEE Conf on Computer Vision and Pattern Recognition (CVPR).
[13]
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Dec. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems (NIPS).
[14]
Dongjoo Shin, Jinmook Lee, Jinsu Lee, and Hoi-Jun Ypp. 2017. 14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In IEEE Int. Solid-State Circuits Conference (ISSCC).
[15]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv: http://arxiv.org/abs/1409.1556
[16]
Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).
[17]
Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Design Automation Conference (DAC).
[18]
Fisher Yu and Vladlen Koltun. 2015. Multi-Scale Context Aggregation by Dilated Convolutions. CoRR abs/1511.07122 (2015). arXiv: http://arxiv.org/abs/1511.07122
[19]
Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In Int. Conf on Computer-Aided Design (ICCAD).
[20]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).
[21]
Ruizhe Zhao, Xinyu Niu, Yajie Wu, Wayne Luk, and Qiang Liu. 2017. Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms. In Applied Reconfigurable Computing (ARC).

Cited By

View all
  • (2024)Statues: Energy-Efficient Video Object Detection on Edge Security Devices with Computational SkippingProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670822(1-6)Online publication date: 5-Aug-2024
  • (2023)FPGA-Based CNN for Eye Detection in an Iris Recognition at a Distance SystemElectronics10.3390/electronics1222471312:22(4713)Online publication date: 20-Nov-2023
  • (2023)Puppis: Hardware Accelerator of Single-Shot Multibox Detectors for Edge-Based ApplicationsElectronics10.3390/electronics1222455712:22(4557)Online publication date: 7-Nov-2023
  • Show More Cited By

Index Terms

  1. Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
    Nov 2018
    939 pages

    Publisher

    IEEE Press

    Publication History

    Published: 05 November 2018

    Permissions

    Request permissions for this article.

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Statues: Energy-Efficient Video Object Detection on Edge Security Devices with Computational SkippingProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670822(1-6)Online publication date: 5-Aug-2024
    • (2023)FPGA-Based CNN for Eye Detection in an Iris Recognition at a Distance SystemElectronics10.3390/electronics1222471312:22(4713)Online publication date: 20-Nov-2023
    • (2023)Puppis: Hardware Accelerator of Single-Shot Multibox Detectors for Edge-Based ApplicationsElectronics10.3390/electronics1222455712:22(4557)Online publication date: 7-Nov-2023
    • (2023)SSDLiteX: Enhancing SSDLite for Small Object DetectionApplied Sciences10.3390/app13211200113:21(12001)Online publication date: 3-Nov-2023
    • (2023)Algorithm-hardware Co-optimization for Energy-efficient Drone Detection on Resource-constrained FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/358307416:2(1-25)Online publication date: 10-May-2023
    • (2023)High-Performance Acceleration of 2-D and 3-D CNNs on FPGAs Using Static Block Floating PointIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.311630234:8(4473-4487)Online publication date: Aug-2023
    • (2023)A fine-grained mixed precision DNN accelerator using a two-stage big–little core RISC-V MCUIntegration10.1016/j.vlsi.2022.10.00688(241-248)Online publication date: Jan-2023
    • (2023)A Low-Latency Hardware Accelerator for YOLO Object Detection AlgorithmsAdvanced Parallel Processing Technologies10.1007/978-981-99-7872-4_15(265-278)Online publication date: 8-Nov-2023
    • (2023)Object Detection in Autonomous Cyber-Physical Vehicle Platforms: Status and Open ChallengesMachine Learning and Optimization Techniques for Automotive Cyber-Physical Systems10.1007/978-3-031-28016-0_17(509-523)Online publication date: 2-Sep-2023
    • (2022)Resource- and Power-Efficient High-Performance Object Detection Inference Acceleration Using FPGAElectronics10.3390/electronics1112182711:12(1827)Online publication date: 8-Jun-2022
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media