More Web Proxy on the site http://driver.im/

research-article

Public Access

Adaptive parallel execution of deep neural networks on heterogeneous edge devices

Authors:

Mohammad Hossein Samavatian,

Saikat Majumdar,

Radu TeodorescuAuthors Info & Claims

SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing

Pages 195 - 208

https://doi.org/10.1145/3318216.3363312

Published: 07 November 2019 Publication History

Abstract

New applications such as smart homes, smart cities, and autonomous vehicles are driving an increased interest in deploying machine learning on edge devices. Unfortunately, deploying deep neural networks (DNNs) on resource-constrained devices presents significant challenges. These workloads are computationally intensive and often require cloud-like resources. Prior solutions attempted to address these challenges by either introducing more design efforts or by relying on cloud resources for assistance.

In this paper, we propose a runtime adaptive convolutional neural network (CNN) acceleration framework that is optimized for heterogeneous Internet of Things (IoT) environments. The framework leverages spatial partitioning techniques through fusion of the convolution layers and dynamically selects the optimal degree of parallelism according to the availability of computational resources, as well as network conditions. Our evaluation shows that our framework outperforms state-of-art approaches by improving the inference speed and reducing communication costs while running on wirelessly-connected Raspberry-Pi3 devices. Experimental evaluation shows up to 1.9x ~ 3.7x speedup using 8 devices for three popular CNN models.

References

[1]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 1--13.

Digital Library

[2]

Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 22.

[3]

Amazon. [n.d.]. Machine Learning on AWS. https://aws.amazon.com/machine-learning/.

[4]

Apple. [n.d.]. Core ML. https://developer.apple.com/documentation/coreml.

[5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations (ICLR).

[6]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).

[7]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). 269--284.

[8]

Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 367--379.

Digital Library

[9]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 609--622.

[10]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (ICLR). 160--167.

Digital Library

[11]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).

[12]

Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). 92--104.

Digital Library

[13]

Marat Dukhan. 2018. NNPACK. https://github.com/Maratyszcza/NNPACK.

[14]

Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision. In Proceedings of the 24th ACM International Conference on Mobile Computing and Networking. 115--127.

Digital Library

[15]

Raspberry Pi Foundation. [n.d.]. Raspberry Pi. https://www.raspberrypi.org/.

[16]

Google. [n.d.]. Cloud Machine Learning Engine. https://cloud.google.com/ml-engine/.

[17]

Ramyad Hadidi, Jiashen Cao, Micheal S Ryoo, and Hyesoon Kim. 2019. Collaborative Execution of Deep Neural Networks on Internet of Things Devices. arXiv preprint arXiv:1901.02537 (2019).

[18]

Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S Ryoo, and Hyesoon Kim. 2018. Distributed Perception by Collaborative Robots. IEEE Robotics and Automation Letters 3, 4 (2018), 3709--3716.

[19]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[20]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[21]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[22]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.

[23]

Intel. [n.d.]. Movidius Neural Compute Stick. https://software.intel.com/en-us/movidius-ncs.

[24]

Hyuk-Jin Jeong, Hyeon-Jae Lee, Chang Hyun Shin, and Soo-Mook Moon. 2018. IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers. In Proceedings of the ACM Symposium on Cloud Computing. 401--411.

Digital Library

[25]

Zhihao Jia, Sina Lin, Charles R Qi, and Alex Aiken. 2018. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1802.04924 (2018).

[26]

Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[27]

Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014).

[28]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[29]

He Li, Kaoru Ota, and Mianxiong Dong. 2018. Learning IoT in edge: deep learning for the internet of things with edge computing. IEEE Network 32, 1 (2018), 96--101.

[30]

Robert LiKamWa, Yunhui Hou, Yuan Gao, Mia Polansky, and Lin Zhong. 2016. RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 255--266.

Digital Library

[31]

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis. Medical image analysis 42 (2017), 60--88.

[32]

Dao-Fu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Temam, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 369--381.

Digital Library

[33]

Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and Yiran Chen. 2017. Modnn: Local distributed mobile computing system for deep neural network. In 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1396--1401.

[34]

Microsoft. [n.d.]. Azure Machine Learning service. https://azure.microsoft.com/en-us/services/machine-learning-service/.

[35]

Mehdi Mohammadi, Ala Al-Fuqaha, Sameh Sorour, and Mohsen Guizani. 2018. Deep learning for IoT big data and streaming analytics: A survey. IEEE Communications Surveys & Tutorials 20, 4 (2018), 2923--2960.

Digital Library

[36]

Nvidia. [n.d.]. Jetson Nano. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/.

[37]

Joseph Redmon. 2013-2016. Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/.

[38]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.

[39]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. arXiv preprint (2017).

[40]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.

Digital Library

[41]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 14--26.

[42]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[43]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.

[44]

Surat Teerapittayanon, Bradley McDanel, and HT Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 328--339.

[45]

TensorFlow^TM. 2019. TensorFlow for Mobile and IoT. https://www.tensorflow.org/lite.

[46]

Vincent Vanhoucke, Andrew Senior, and Mark Z Mao. 2011. Improving the speed of neural networks on CPUs. In in Deep Learning and Unsupervised Feature Learning Workshop, NIPS. Citeseer.

[47]

Zirui Xu, Zhuwei Qin, Fuxun Yu, Chenchen Liu, and Xiang Chen. 2018. DiReCt: Resource-Aware Dynamic Model Reconfiguration for Convolutional Neural Network in Mobile Systems. In Proceedings of the ACM International Symposium on Low Power Electronics and Design. 37.

Digital Library

[48]

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 548--560.

Digital Library

[49]

Qi Zhang, Lu Cheng, and Raouf Boutaba. 2010. Cloud computing: state-of-the-art and research challenges. Journal of internet services and applications 1, 1 (2010), 7--18.

[50]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848--6856.

[51]

Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2348--2359.

Cited By

Yuan QLi Z(2025)Distributed Inference Models and Algorithms for Heterogeneous Edge Systems Using Deep LearningApplied Sciences10.3390/app1503109715:3(1097)Online publication date: 22-Jan-2025
https://doi.org/10.3390/app15031097
Dai PHan BLi KXu XXing HLiu K(2025)Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge ComputingIEEE Transactions on Mobile Computing10.1109/TMC.2024.345779324:1(210-226)Online publication date: Jan-2025
https://doi.org/10.1109/TMC.2024.3457793
Guo XJiang QPimentel AStefanov T(2025)Model and system robustness in distributed CNN inference at the edgeIntegration10.1016/j.vlsi.2024.102299100(102299)Online publication date: Jan-2025
https://doi.org/10.1016/j.vlsi.2024.102299
Show More Cited By

Index Terms

Adaptive parallel execution of deep neural networks on heterogeneous edge devices
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded hardware
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

Unsupervised test-time adaptation of deep neural networks at the edge: a case study
DATE '22: Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe

Deep learning is being increasingly used in mobile and edge autonomous systems. The prediction accuracy of deep neural networks (DNNs), however, can degrade after deployment due to encountering data samples whose distributions are different than the ...
Edge-preserving image denoising using a deep convolutional neural network
Highlights
- This paper makes use of a deep CNN for image denoising.
- The network is trained ...
Abstract
This paper introduces a novel denoising approach making use of a deep convolutional neural network to preserve image edges. The network is trained by using the edge map obtained from the well-known Canny algorithm and aims at ...
Convergence of deep convolutional neural networks
Abstract
Convergence of deep neural networks as the depth of the networks tends to infinity is fundamental in building the mathematical foundation for deep learning. In a previous study, we investigated this question for deep networks with the Rectified ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing

November 2019

455 pages

ISBN:9781450367332

DOI:10.1145/3318216

General Chairs:
Songqing Chen
George Mason University
,
Ryokichi Onishi
Toyota
,
Program Chairs:
Ganesh Ananthanarayanan
Microsoft Research
,
Qun Li
College of William & Mary

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

In-Cooperation

IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SEC '19

Sponsor:

SIGMOBILE

SEC '19: The Fourth ACM/IEEE Symposium on Edge Computing

November 7 - 9, 2019

Virginia, Arlington

Acceptance Rates

SEC '19 Paper Acceptance Rate 20 of 59 submissions, 34%;

Overall Acceptance Rate 40 of 100 submissions, 40%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

76
Total Citations
View Citations
2,851
Total Downloads

Downloads (Last 12 months)504
Downloads (Last 6 weeks)41

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yuan QLi Z(2025)Distributed Inference Models and Algorithms for Heterogeneous Edge Systems Using Deep LearningApplied Sciences10.3390/app1503109715:3(1097)Online publication date: 22-Jan-2025
https://doi.org/10.3390/app15031097
Dai PHan BLi KXu XXing HLiu K(2025)Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge ComputingIEEE Transactions on Mobile Computing10.1109/TMC.2024.345779324:1(210-226)Online publication date: Jan-2025
https://doi.org/10.1109/TMC.2024.3457793
Guo XJiang QPimentel AStefanov T(2025)Model and system robustness in distributed CNN inference at the edgeIntegration10.1016/j.vlsi.2024.102299100(102299)Online publication date: Jan-2025
https://doi.org/10.1016/j.vlsi.2024.102299
Khatami Zenozian EKamal MAfzali-Kusha APedram M(2025)Enabling Memory-Augmented Neural Networks for Efficient Edge ApplicationsAI-Enabled Electronic Circuit and System Design10.1007/978-3-031-71436-8_16(565-605)Online publication date: 28-Jan-2025
https://doi.org/10.1007/978-3-031-71436-8_16
Zhang MShi HMa R(2024)Edge-Distributed IoT Services Assist the Economic Sustainability of LEO Satellite Constellation ConstructionSustainability10.3390/su1604159916:4(1599)Online publication date: 14-Feb-2024
https://doi.org/10.3390/su16041599
Wang JChen CLi SWang CCao XYang L(2024)Researching the CNN Collaborative Inference Mechanism for Heterogeneous Edge DevicesSensors10.3390/s2413417624:13(4176)Online publication date: 27-Jun-2024
https://doi.org/10.3390/s24134176
Hu XLi HZhang HWu BMa LWen XGao J(2024)Empirically informed convolutional neural network model for logging curve calibrationGEOPHYSICS10.1190/geo2022-0696.189:2(D139-D148)Online publication date: 14-Feb-2024
https://doi.org/10.1190/geo2022-0696.1
Li LLi YTan H(2024)DeepDecompose: A Distributed inference framework for CNN on GPU-equipped Edge ClustersProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670200(540-544)Online publication date: 24-May-2024
https://dl.acm.org/doi/10.1145/3670105.3670200
Chen YLuo TFang WXiong N(2024)EdgeCI: Distributed Workload Assignment and Model Partitioning for CNN Inference on Edge ClustersACM Transactions on Internet Technology10.1145/365604124:2(1-24)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.1145/3656041
Lee JBahk IKim HJeong SLee SMin D(2024)An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge DevicesProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656628(50-61)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656628
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten