[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access
Just Accepted

MemoriaNova: Optimizing Memory-Aware Model Inference for Edge Computing

Online AM: 28 October 2024 Publication History

Abstract

In recent years, deploying deep learning models on edge devices has become pervasive, driven by the increasing demand for intelligent edge computing solutions across various industries. From industrial automation to intelligent surveillance and healthcare, edge devices are being leveraged for real-time analytics and decision-making. Existing methods face two challenges when deploying machine learning models on edge devices. The first challenge is handling the execution order of operators with a simple strategy, which can lead to a potential waste of memory resources when dealing with directed acyclic graph structure models. The second challenge is that they usually process operators of a model one by one to optimize the inference latency, which may lead to the optimization problem getting trapped in local optima.
We present MemoriaNova, comprising BTSearch and GenEFlow, to solve these two problems. BTSearch is a graph state backtracking algorithm with efficient pruning and hashing strategies designed to minimize memory overhead during inference and enlarge latency optimization search space. GenEFlow, based on genetic algorithms, integrates latency modeling and memory constraints to optimize distributed inference latency. This innovative approach considers a comprehensive search space for model partitioning, ensuring robust and adaptable solutions. We implement BTSearch and GenEFlow and test them on eleven deep-learning models with different structures and scales. The results show that BTSearch can reach 12% memory optimization compared with the widely used random execution strategy. At the same time, GenEFlow reduces inference latency by 33.9% in distributed systems with four-edge devices.

References

[1]
Hamid Arabnejad and Jorge G. Barbosa. 2014. List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table. IEEE Transactions on Parallel and Distributed Systems 25, 3 (2014), 682–694.
[2]
Graham Brightwell and Peter Winkler. 1991. Counting linear extensions. Order 8(1991), 225–242.
[3]
Zinuo Cai, Zebin Chen, Zihan Liu, Quanmin Xie, Ruhui Ma, and Haibing Guan. 2023. RIDIC: Real-Time Intelligent Transportation System With Dispersed Computing. IEEE Transactions on Intelligent Transportation Systems (2023).
[4]
Zinuo Cai, Zebin Chen, Ruhui Ma, and Haibing Guan. 2023. SMSS: Stateful Model Serving in Metaverse with Serverless Computing and GPU Sharing. IEEE Journal on Selected Areas in Communications (2023).
[5]
Antonio Carlos Cob-Parro, Cristina Losada-Gutiérrez, Marta Marrón-Romera, Alfredo Gardel-Vicente, and Ignacio Bravo-Muñoz. 2021. Smart Video Surveillance System Based on Edge Computing. Sensors 21, 9 (2021).
[6]
Jacqueline M Cole. 2020. A design-to-device pipeline for data-driven materials discovery. Accounts of chemical research 53, 3 (2020), 599–610.
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
[8]
Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, and Song Han. 2021. Ios: Inter-operator scheduler for cnn acceleration. Proceedings of Machine Learning and Systems 3 (2021), 167–180.
[9]
Khasim Vali Dudekula, Hussain Syed, Mohamed Iqbal Mahaboob Basha, Sudhakar Ilango Swamykan, Purna Prakash Kasaraneni, Yellapragada Venkata Pavan Kumar, Aymen Flah, and Ahmad Taher Azar. 2023. Convolutional Neural Network-Based Personalized Program Recommendation System for Smart Television Users. Sustainability 15, 3 (2023), 2206.
[10]
Mohammad Goudarzi, Marimuthu Palaniswami, and Rajkumar Buyya. 2022. Scheduling IoT applications in edge and fog computing environments: a taxonomy and future directions. Comput. Surveys 55, 7 (2022), 1–41.
[11]
Jalalu Guntur, S Srinivasulu Raju, T Niranjan, Sai Kiran Kilaru, Rakesh Dronavalli, and N Surya Seshu Kumar. 2023. IoT-Enhanced Smart Door Locking System with Security. SN Computer Science 4, 2 (2023), 209.
[12]
Xiaotian Guo, Andy D. Pimentel, and Todor Stefanov. 2023. Hierarchical Design Space Exploration for Distributed CNN Inference at the Edge. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer Nature Switzerland, Cham, 545–556.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. arxiv:1905.02244  [cs.CV]
[15]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861(2017).
[16]
Chenghao Hu and Baochun Li. 2022. Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 330–339.
[17]
Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 82–95.
[18]
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arxiv:1602.07360  [cs.CV]
[19]
Jazzbin J. 2020. Geatpy: the genetic and evolutionary algorithm toolbox with high performance in python.
[20]
Amanda Jayanetti, Saman Halgamuge, and Rajkumar Buyya. 2024. Multi-Agent Deep Reinforcement Learning Framework for Renewable Energy-Aware Workflow Scheduling on Distributed Cloud Data Centers. IEEE Transactions on Parallel and Distributed Systems (2024).
[21]
Dieter Jungnickel. 2013. The Greedy Algorithm. Springer Berlin Heidelberg, Berlin, Heidelberg, 135–161.
[22]
Yassin Kortli, Maher Jridi, Ayman Al Falou, and Mohamed Atri. 2020. Face Recognition Systems: A Survey. Sensors 20, 2 (2020).
[23]
Jieh-Sheng Lee and Jieh Hsiang. 2020. Patent claim generation by fine-tuning OpenAI GPT-2. World Patent Information 62 (2020), 101983.
[24]
Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281(2023).
[25]
Sicong Liu, Yingyan Lin, Zimu Zhou, Kaiming Nan, Hui Liu, and Junzhao Du. 2018. On-demand deep model compression for mobile devices: A usage-driven model selection framework. In Proceedings of the 16th annual international conference on mobile systems, applications, and services. 389–400.
[26]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 21–37.
[27]
Yura Malitsky and Matthew K Tam. 2023. Resolvent splitting for sums of monotone operators with minimal lifting. Mathematical Programming 201, 1 (2023), 231–262.
[28]
Thaha Mohammed, Carlee Joe-Wong, Rohit Babbar, and Mario Di Francesco. 2020. Distributed inference acceleration with adaptive DNN partitioning and offloading. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 854–863.
[29]
Xiaonan Nie, Xupeng Miao, Zhi Yang, and Bin Cui. 2022. Tsplit: Fine-grained gpu memory management for efficient dnn training via tensor splitting. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2615–2628.
[30]
Jeongeun Park, Donguk Yang, and Ha Young Kim. 2023. Text mining-based four-step framework for smart speaker product improvement and sales planning. Journal of Retailing and Consumer Services 71 (2023), 103186.
[31]
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollar. 2020. Designing Network Design Spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33]
Wei-Qing Ren, Yu-Ben Qu, Chao Dong, Yu-Qian Jing, Hao Sun, Qi-Hui Wu, and Song Guo. 2023. A survey on collaborative DNN inference for edge intelligence. Machine Intelligence Research 20, 3 (2023), 370–395.
[34]
Hongjian Shi, Weichu Zheng, Zifei Liu, Ruhui Ma, and Haibing Guan. 2023. Automatic Pipeline Parallelism: A Parallel Inference Framework for Deep Learning Applications in 6G Mobile Communication Systems. IEEE Journal on Selected Areas in Communications (2023).
[35]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv:1409.1556  [cs.CV]
[36]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper With Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38]
Athanasios Voulodimos, Nikolaos Doulamis, Anastasios Doulamis, Eftychios Protopapadakis, et al. 2018. Deep learning for computer vision: A brief review. Computational intelligence and neuroscience 2018 (2018).
[39]
Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, and Rajkumar Buyya. 2024. Deep Reinforcement Learning-based scheduling for optimizing system load and response time in edge and fog computing environments. Future Generation Computer Systems 152 (2024), 55–69.
[40]
Zihan Wang, Chengcheng Wan, Yuting Chen, Ziyi Lin, He Jiang, and Lei Qiao. 2022. Hierarchical Memory-Constrained Operator Scheduling of Neural Architecture Search Networks. In Proceedings of the 59th ACM/IEEE Design Automation Conference (San Francisco, California) (DAC ’22). Association for Computing Machinery, New York, NY, USA, 493–498.
[41]
Yuanjia Xu, Heng Wu, Wenbo Zhang, and Yi Hu. 2022. EOP: efficient operator partition for deep learning inference over edge servers. In Proceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Virtual, Switzerland) (VEE 2022). Association for Computing Machinery, 45–57.
[42]
Zhiying Xu, Hongding Peng, and Wei Wang. 2023. AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization. In IEEE INFOCOM 2023 - IEEE Conference on Computer Communications. 1–10.
[43]
Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, and Tarek Abdelzaher. 2017. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM conference on embedded network sensor systems. 1–14.
[44]
Abbas Yazdinejad, Behrouz Zolfaghari, Ali Dehghantanha, Hadis Karimipour, Gautam Srivastava, and Reza M Parizi. 2023. Accurate threat hunting in industrial internet of things edge devices. Digital Communications and Networks 9, 5 (2023), 1123–1130.
[45]
Chuanlong Yin, Yuefei Zhu, Jinlong Fei, and Xinzheng He. 2017. A deep learning approach for intrusion detection using recurrent neural networks. Ieee Access 5(2017), 21954–21961.
[46]
Liekang Zeng, Xu Chen, Zhi Zhou, Lei Yang, and Junshan Zhang. 2021. CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices. IEEE/ACM Transactions on Networking 29, 2 (2021), 595–608.
[47]
Rui Zhang, Xuesen Chu, Ruhui Ma, Meng Zhang, Liwei Lin, Honghao Gao, and Haibing Guan. 2022. OSTTD: Offloading of splittable tasks with topological dependence in multi-tier computing networks. IEEE Journal on Selected Areas in Communications 41, 2(2022), 555–568.
[48]
Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11(2018), 2348–2359.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization Just Accepted
EISSN:1544-3973
Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 28 October 2024
Accepted: 19 September 2024
Revised: 10 August 2024
Received: 01 March 2024

Check for updates

Author Tags

  1. Deep Learning
  2. Edge Computing
  3. Memory Optimization
  4. Distributed System
  5. Inference Latency Optimization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 144
    Total Downloads
  • Downloads (Last 12 months)144
  • Downloads (Last 6 weeks)58
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media