[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3560905.3568520acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article

BlastNet: Exploiting Duo-Blocks for Cross-Processor Real-Time DNN Inference

Published: 24 January 2023 Publication History

Abstract

In recent years, Deep Neural Network (DNN) has been increasingly adopted by a wide range of time-critical applications running on edge platforms with heterogeneous multiprocessors. To meet the stringent timing requirements of these applications, heterogeneous CPU and GPU resources must be efficiently utilized for the inference of multiple DNN models. Such a cross-processor real-time DNN inference paradigm poses major challenges due to the inherent performance imbalance among different processors and the lack of real-time support for cross-processor inference from existing deep learning frameworks. In this work, we propose a new system named BlastNet that exploits duo-block - a new model inference abstraction to support highly efficient cross-processor real-time DNN inference. Each duo-block has a dual model structure, enabling efficient fine-grained inference alternatively across different processors. BlastNet employs a novel block-level Neural Architecture Search (NAS) technique to generate duo-blocks, which accounts for computing characteristics and communication overhead. The duo-blocks are optimized at design time and then dynamically scheduled to achieve high resource utilization of heterogeneous CPU and GPU at runtime. BlastNet is implemented on an indoor autonomous driving platform and three popular edge platforms. Extensive results show that BlastNet achieves 35.07 % less deadline missing rate with a mere 1.63% of model accuracy loss.

References

[1]
Soroush Bateni, Husheng Zhou, Yuankun Zhu, and Cong Liu. Predjoule: A timing-predictable energy optimization framework for deep neural networks. In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 107--118. IEEE, 2018.
[2]
Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332, 2018.
[3]
Guoguo Chen, Carolina Parada, and Georg Heigold. Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4087--4091. IEEE, 2014.
[4]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. {TVM}: An automated {End-to-End} optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 578--594, 2018.
[5]
F1TENTH Community. F1tenth. https://f1tenth.org/.
[6]
Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, et al. Chamnet: Towards efficient network design through platform-aware model adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11398--11407, 2019.
[7]
Xianzhi Du, Mostafa El-Khamy, Jungwon Lee, and Larry Davis. Fused dnn: A deep neural network fusion approach to fast and robust pedestrian detection. In 2017 IEEE winter conference on applications of computer vision (WACV), pages 953--961. IEEE, 2017.
[8]
Alireza Ghaffari and Yvon Savaria. Cnn2gate: Toward designing a general framework for implementation of convolutional neural networks on fpga. arXiv preprint arXiv:2004.04641, 2020.
[9]
Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630, 2021.
[10]
News Google. Google tensor soc, titan m2 security chip features detailed. https://www.fonearena.com/blog/350642/google-tensor-soc-features.html/.
[11]
Myeonggyun Han, Jihoon Hyun, Seongbeom Park, Jinsu Park, and Woongki Baek. Mosaic: Heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 165--177. IEEE, 2019.
[12]
Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, Jian Tang, and Lydia Y Chen. Legodnn: block-grained scaling of deep neural networks for mobile vision. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pages 406--419, 2021.
[13]
Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
[14]
Cong Hao, Atif Sarwari, Zhijie Jin, Husam Abu-Haimed, Daryl Sew, Yuhong Li, Xinheng Liu, Bryan Wu, Dongdong Fu, Junli Gu, et al. A hybrid gpu+ fpga system design for autonomous driving cars. In 2019 IEEE International Workshop on Signal Processing Systems (SiPS), pages 121--126. IEEE, 2019.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.
[16]
Brian Hickmann, Jieasheng Chen, Michael Rotzin, Andrew Yang, Maciej Urbanski, and Sasikanth Avancha. Intel nervana neural network processor-t (nnp-t) fused floating point many-term dot product. In 2020 IEEE 27th Symposium on Computer Arithmetic (ARITH), pages 133--136. IEEE, 2020.
[17]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314--1324, 2019.
[18]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[19]
Joo Seong Jeong, Jingyu Lee, Donghyun Kim, Changmin Jeon, Changjin Jeong, Youngki Lee, and Byung-Gon Chun. Band: coordinated multi-dnn inference on heterogeneous mobile processors. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, pages 235--247, 2022.
[20]
Fucheng Jia, Deyu Zhang, Ting Cao, Shiqi Jiang, Yunxin Liu, Ju Ren, and Yaoxue Zhang. Codl: efficient cpu-gpu co-execution for deep learning inference on mobile devices. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, pages 209--221, 2022.
[21]
Woosung Kang, Kilho Lee, Jinkyu Lee, Insik Shin, and Hoon Sung Chwa. Lalarand: Flexible layer-by-layer cpu/gpu scheduling for real-time dnn tasks. In 2021 IEEE Real-Time Systems Symposium (RTSS), pages 329--341. IEEE, 2021.
[22]
Dewant Katare and Mohamed El-Sharkawy. Embedded system enabled vehicle collision detection: an ann classifier. In 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pages 0284--0289. IEEE, 2019.
[23]
Sandra Khvoynitskaya. 3 types of autonomous vehicle sensors in self-driving cars. https://www.itransition.com/blog/autonomous-vehicle-sensors.
[24]
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
[25]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.
[26]
Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. Spinn: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--15, 2020.
[27]
Seulki Lee and Shahriar Nirjon. Deep functional network (dfn) functional interpretation of deep neural networks for intelligent sensing systems. In Proceedings of the 20th International Conference on Information Processing in Sensor Networks (co-located with CPS-IoT Week 2021), pages 191--206, 2021.
[28]
Peiliang Li, Xiaozhi Chen, and Shaojie Shen. Stereo r-cnn based 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7644--7652, 2019.
[29]
Peilun Li, Guozhen Li, Zhangxi Yan, Youzeng Li, Meiqi Lu, Pengfei Xu, Yang Gu, Bing Bai, Yifei Zhang, and DiDi Chuxing. Spatio-temporal consistency and hierarchical matching for multi-target multi-camera vehicle tracking. In CVPR Workshops, pages 222--230, 2019.
[30]
Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, and Jian Sun. Detnet: A backbone network for object detection. arXiv preprint arXiv:1804.06215, 2018.
[31]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980--2988, 2017.
[32]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740--755. Springer, 2014.
[33]
Neiwen Ling, Kai Wang, Yuze He, Guoliang Xing, and Daqi Xie. Rt-mdl: Supporting real-time mixed deep learning tasks on edge platforms. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, pages 1--14, 2021.
[34]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
[35]
Xing Liu, Minjie Zhang, Chengming Zou, Jianfeng Yang, and Xin Yan. Edge intelligence for smart metro systems: Architecture and enabling technologies. IEEE Network, 36(1):136--143, 2021.
[36]
S Divya Meena and Agilandeeswari Loganathan. Intelligent animal detection system using sparse multi discriminative-neural network (smd-nn) to mitigate animal-vehicle collision. Environmental Science and Pollution Research, 27(31):39619--39634, 2020.
[37]
Sparsh Mittal and Jeffrey S Vetter. A survey of cpu-gpu heterogeneous computing techniques. ACM Computing Surveys (CSUR), 47(4):1--35, 2015.
[38]
Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, and Bin Ren. Dnnfusion: accelerating deep neural networks execution with advanced operator fusion. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pages 883--898, 2021.
[39]
NVIDIA. Cuda c/c++ streams and concurrency. https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf.
[40]
NVIDIA. Jetson agx xavier series modules and developer kit. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/.
[41]
NVIDIA. Tegrastats utility. https://docs.nvidia.com/drive/drive_os_5.1.6.1L/nvvib_docs/index.html#page/DRIVE_OS_Linux_SDK_Development_Guide/Utilities/util_tegrastats.html.
[42]
PyTorch. Cpu threading and torchscript inference. https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html.
[43]
Dipankar Raychaudhuri, Ivan Seskar, Gil Zussman, Thanasis Korakis, Dan Kilper, Tingjun Chen, Jakub Kolodziejski, Michael Sherman, Zoran Kostic, Xiaoxiong Gu, et al. Challenge: Cosmos: A city-scale programmable testbed for experimentation with advanced wireless. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--13, 2020.
[44]
Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv, 2018.
[45]
Microsoft Research. Nni (neural network intelligence). https://nni.readthedocs.io/en/stable/.
[46]
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[47]
Mingcong Song, Yang Hu, Huixiang Chen, and Tao Li. Towards pervasive and user satisfactory cnn across gpu microarchitectures. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 1--12. IEEE, 2017.
[48]
J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, (0):-, 2012.
[49]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2820--2828, 2019.
[50]
Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, and Yunxin Liu. To bridge neural network design and real-world performance: A behaviour study for neural networks. Proceedings of Machine Learning and Systems, 3:21--37, 2021.
[51]
Manni Wang, Shaohua Ding, Ting Cao, Yunxin Liu, and Fengyuan Xu. Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pages 215--228, 2021.
[52]
Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1--34, 2020.
[53]
Wikipedia. Worst-case execution time. https://en.wikipedia.org/wiki/Worst-case_execution_time.
[54]
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10734--10742, 2019.
[55]
Yecheng Xiang and Hyoseung Kim. Pipelined data-parallel cpu/gpu scheduling for multi-dnn real-time inference. In 2019 IEEE Real-Time Systems Symposium (RTSS), pages 392--405. IEEE, 2019.
[56]
Lele Xie, Tasweer Ahmad, Lianwen Jin, Yuliang Liu, and Sheng Zhang. A new cnn-based method for multi-directional car license plate detection. IEEE Transactions on Intelligent Transportation Systems, 19(2):507--517, 2018.
[57]
Xiufeng Xie and Kyu-Han Kim. Source compression with bounded dnn perception loss for iot edge computer vision. In The 25th Annual International Conference on Mobile Computing and Networking, pages 1--16, 2019.
[58]
Zhiyuan Xu, Dejun Yang, Chengxiang Yin, Jian Tang, Yanzhi Wang, and Guoliang Xue. A co-scheduling framework for dnn models on mobile and edge devices with heterogeneous hardware. IEEE Transactions on Mobile Computing, 2021.
[59]
Juheon Yi and Youngki Lee. Heimdall: mobile gpu coordination platform for augmented reality applications. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--14, 2020.
[60]
Fotios Zantalis, Grigorios Koulouras, Sotiris Karabetsos, and Dionisis Kandris. A review of machine learning and iot in smart transportation. Future Internet, 11(4):94, 2019.
[61]
Liekang Zeng, Xu Chen, Zhi Zhou, Lei Yang, and Junshan Zhang. Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices. IEEE/ACM Transactions on Networking, 29(2):595--608, 2020.
[62]
Zhihe Zhao, Zhehao Jiang, Neiwen Ling, Xian Shuai, and Guoliang Xing. Ecrt: An edge computing system for real-time image-based object tracking. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, pages 394--395, 2018.
[63]
Zhihe Zhao, Kai Wang, Neiwen Ling, and Guoliang Xing. Edgeml: An automl framework for real-time deep learning on the edge. In Proceedings of the International Conference on Internet-of-Things Design and Implementation, pages 133--144, 2021.

Cited By

View all
  • (2024)Panopticus: Omnidirectional 3D Object Detection on Resource-constrained Edge DevicesProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3690688(1207-1221)Online publication date: 4-Dec-2024
  • (2024)Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End CollaborationACM Transactions on Embedded Computing Systems10.1145/363470423:1(1-25)Online publication date: 19-Jan-2024
  • (2024)DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative InferenceIEEE Transactions on Mobile Computing10.1109/TMC.2024.335721823:10(9042-9059)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SenSys '22: Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems
November 2022
1280 pages
ISBN:9781450398862
DOI:10.1145/3560905
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CPU-GPU heterogeneous platform
  2. edge artificial intelligence
  3. multi-DNN concurrent execution
  4. neural architecture search
  5. on-device deep learning
  6. real-time scheduling

Qualifiers

  • Research-article

Funding Sources

  • Research Grants Council (RGC)-General Research Fund

Conference

Acceptance Rates

SenSys '22 Paper Acceptance Rate 52 of 187 submissions, 28%;
Overall Acceptance Rate 174 of 867 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)357
  • Downloads (Last 6 weeks)44
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Panopticus: Omnidirectional 3D Object Detection on Resource-constrained Edge DevicesProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3690688(1207-1221)Online publication date: 4-Dec-2024
  • (2024)Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End CollaborationACM Transactions on Embedded Computing Systems10.1145/363470423:1(1-25)Online publication date: 19-Jan-2024
  • (2024)DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative InferenceIEEE Transactions on Mobile Computing10.1109/TMC.2024.335721823:10(9042-9059)Online publication date: Oct-2024
  • (2024)SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory BudgetIEEE Transactions on Mobile Computing10.1109/TMC.2024.335576423:9(8935-8950)Online publication date: Sep-2024
  • (2024)Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment2024 IEEE 30th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA62462.2024.00015(37-42)Online publication date: 21-Aug-2024
  • (2024)COS: Cross-Processor Operator Scheduling for Multi-Tenant Deep Learning Inference2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682900(1-10)Online publication date: 19-Jun-2024
  • (2024)Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer InferenceIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621342(1001-1010)Online publication date: 20-May-2024
  • (2024)Flexible and Fully Quantized Lightweight TinyissimoYOLO for Ultra-Low-Power Edge SystemsIEEE Access10.1109/ACCESS.2024.340487812(75093-75107)Online publication date: 2024
  • (2023)LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing PlatformsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625804(138-151)Online publication date: 12-Nov-2023
  • (2023)Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPUProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625789(97-110)Online publication date: 12-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media