[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3545008.3545051acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database

Published: 13 January 2023 Publication History

Abstract

Deep neural networks (DNNs) are widely used in various applications. The accurate and latency feedback is essential for model design and deployment. In this work, we attempt to alleviate the cost of model latency acquisition from two aspects: latency query and latency prediction. To ease the difficulty of acquiring model latency on multi-platform, our latency query system can automatically convert DNN model into the corresponding executable format, and measure latency on the target hardware. Powered by this, latency queries can be fulfilled with a simple interface calling. For the efficient utilization of previous latency knowledge, we employ a MySQL database to store numerous models and the corresponding latencies. In our system, the efficiency of latency query can be boosted by 1.8 ×. For latency prediction, we first represent neural networks with the unified GNN-based graph embedding. With the help of the evolving database, our model-based latency predictor achieves better performance, which realizes 12.31% accuracy improvement compared with existing methods. Our codes are open-sourced at https://github.com/ModelTC/NNLQP.

References

[1]
Junjie Bai, Fang Lu, Ke Zhang, 2019. ONNX: Open Neural Network Exchange. https://github.com/onnx/onnx.
[2]
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-All: Train One Network and Specialize it for Efficient Deployment. arxiv:1908.09791 [cs.LG]
[3]
Han Cai, Ligeng Zhu, and Song Han. 2018. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332(2018).
[4]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578–594.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL]
[6]
TensorRT Documentation. 2021. Optimizing for Tensor Cores. https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#optimize-tensor-cores.
[7]
Łukasz Dudziak, Thomas Chau, Mohamed S Abdelfattah, Royson Lee, Hyeji Kim, and Nicholas D Lane. 2020. Brp-nas: Prediction-based nas using gcns. arXiv preprint arXiv:2007.08668(2020).
[8]
Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, and Junjie Yan. 2019. Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks. In The IEEE International Conference on Computer Vision (ICCV).
[9]
Miguel Grinberg. 2018. Flask web development: developing web applications with python. ” O’Reilly Media, Inc.”.
[10]
Aric Hagberg, Pieter Swart, and Daniel S Chult. 2008. Exploring network structure, dynamics, and function using NetworkX. Technical Report. Los Alamos National Lab.(LANL), Los Alamos, NM (United States).
[11]
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025–1035.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arxiv:1512.03385 [cs.CV]
[13]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision. 1314–1324.
[14]
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360(2016).
[15]
Samuel Kaufman, Phitchaya Mangpo Phothilimthana, and Mike Burrows. 2019. Learned TPU cost model for XLA tensor programs. In Proc. Workshop ML Syst. NeurIPS. 1–6.
[16]
Samuel J Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, and Mike Burrows. 2020. A Learned Performance Model for Tensor Processing Units. arXiv preprint arXiv:2008.01040(2020).
[17]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[18]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
[19]
Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, and Yingyan Lin. 2021. Hw-nas-bench: Hardware-aware neural architecture search benchmark. arXiv preprint arXiv:2103.10584(2021).
[20]
Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, and Shi Gu. 2021. BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction. In International Conference on Learning Representations. https://openreview.net/forum?id=POWv6hDd9XH
[21]
Yuhang Li, Mingzhu Shen, Jian Ma, Yan Ren, Mingxin Zhao, Qi Zhang, Ruihao Gong, Fengwei Yu, and Junjie Yan. 2021. MQBench: Towards Reproducible and Deployable Model Quantization Benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). https://openreview.net/forum?id=TUplOmF8DsM
[22]
Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. DaVinci: A Scalable Architecture for Neural Network Computing. In Hot Chips Symposium. 1–44.
[23]
Ji Lin, Wei-Ming Chen, Yujun Lin, Chuang Gan, Song Han, 2020. Mcunet: Tiny deep learning on iot devices. Advances in Neural Information Processing Systems 33 (2020), 11711–11722.
[24]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal Loss for Dense Object Detection. arxiv:1708.02002 [cs.CV]
[25]
Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. 2018. Nvidia tensor core programmability, performance & precision. In 2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, 522–531.
[26]
Thomas Norrie, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li, James Laudon, Cliff Young, Norman Jouppi, and David Patterson. 2021. The Design Process for Google’s Training Chips: TPUv2 and TPUv3. IEEE Micro 41, 2 (2021), 56–63. https://doi.org/10.1109/MM.2021.3058217
[27]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026–8037.
[28]
Evgeny Ponomarev, Sergey Matveev, and Ivan Oseledets. 2020. LETI: Latency Estimation Tool and Investigation of Neural Networks inference on Mobile GPU. arXiv preprint arXiv:2010.02871(2020).
[29]
Haotong Qin, Ruihao Gong, Xianglong Liu, Xiao Bai, Jingkuan Song, and Nicu Sebe. 2020. Binary neural networks: A survey. Pattern Recognition (2020), 107281. https://doi.org/10.1016/j.patcog.2020.107281
[30]
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Designing network design spaces. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10428–10436.
[31]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arxiv:1506.01497 [cs.CV]
[32]
Jaehun Ryu and Hyojin Sung. 2021. MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks. arxiv:2102.04199 [cs.LG]
[33]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
[34]
Mingzhu Shen, Kai Han, Chunjing Xu, and Yunhe Wang. 2019. Searching for accurate binary neural architectures. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0–0.
[35]
Mingzhu Shen, Feng Liang, Ruihao Gong, Yuhang Li, Chuming Li, Chen Lin, Fengwei Yu, Junjie Yan, and Wanli Ouyang. 2021. Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search. arxiv:2010.04354 [cs.CV]
[36]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).
[37]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9.
[38]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2820–2828.
[39]
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105–6114.
[40]
Xiuying Wei, Ruihao Gong, Yuhang Li, Xianglong Liu, and Fengwei Yu. 2022. QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization. In International Conference on Learning Representations. https://openreview.net/forum?id=ySQH0oDyp7
[41]
Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, and Yunxin Liu. 2021. nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services. 81–93.
[42]
Lianmin Zheng, Ruochen Liu, Junru Shao, Tianqi Chen, Joseph E. Gonzalez, Ion Stoica, and Ameer Haj Ali. 2021. TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). https://openreview.net/forum?id=aIfp8kLuvc9
[43]
Feng Zhu, Ruihao Gong, Fengwei Yu, Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, and Junjie Yan. 2020. Towards Unified INT8 Training for Convolutional Neural Network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Cited By

View all
  • (2024)All-Sky Autonomous Computing in UAV SwarmIEEE Transactions on Mobile Computing10.1109/TMC.2024.342742023:12(13258-13274)Online publication date: Dec-2024
  • (2023)An Experimental Study of DNN Operator-Level Performance on Edge Devices2023 IEEE International Conference on Smart Internet of Things (SmartIoT)10.1109/SmartIoT58732.2023.00026(131-138)Online publication date: 25-Aug-2023
  • (2023)NAR-Former: Neural Architecture Representation Learning Towards Holistic Attributes Prediction2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00745(7715-7724)Online publication date: Jun-2023
  • Show More Cited By

Index Terms

  1. NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
      August 2022
      976 pages
      ISBN:9781450397339
      DOI:10.1145/3545008
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 January 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. latency prediction
      2. latency query
      3. multi-platform
      4. neural network

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICPP '22
      ICPP '22: 51st International Conference on Parallel Processing
      August 29 - September 1, 2022
      Bordeaux, France

      Acceptance Rates

      Overall Acceptance Rate 91 of 313 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)155
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 26 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)All-Sky Autonomous Computing in UAV SwarmIEEE Transactions on Mobile Computing10.1109/TMC.2024.342742023:12(13258-13274)Online publication date: Dec-2024
      • (2023)An Experimental Study of DNN Operator-Level Performance on Edge Devices2023 IEEE International Conference on Smart Internet of Things (SmartIoT)10.1109/SmartIoT58732.2023.00026(131-138)Online publication date: 25-Aug-2023
      • (2023)NAR-Former: Neural Architecture Representation Learning Towards Holistic Attributes Prediction2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00745(7715-7724)Online publication date: Jun-2023
      • (2023)Accurate Latency Prediction of Deep Learning Model Inference Under Dynamic Runtime ResourceNeural Information Processing10.1007/978-981-99-8126-7_39(495-510)Online publication date: 13-Nov-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media