Optimization of Tensor Operation in Compiler

Chenguang Qiu¹⁹,
Jun Wu²⁰,
Haoqi Ren¹⁹ &
…
Zhifeng Zhang¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 500))

Included in the following conference series:

International Conference on Communications and Networking in China

494 Accesses

Abstract

This paper proposes an AI compiler architecture, which can compile the trained model and deploy it on DSP chip. The biggest difficulty in deploying the reasoning model on DSP is the multiplication between tensors. Tensor multiplication is the main operation and the most time-consuming operation in the process of model reasoning. Therefore, the operation efficiency of tensor multiplication directly restricts the performance of reasoning. However, there is no matrix computing unit in DSP chip, instead of vector computing unit. We define a new dialect in MLIR(Multi-Level Intermediate Representation) to efficiently compile AI models, especially GEMM and conv operations. The dialect is based on the basic features of mhlo, so this new dialect can make full use of the existing optimized pass of mhlo. Moreover, we have added some functions to support architecture related optimization, mainly the lower algorithm of operation, such as GEMM and conv. we finally map dialect to LLVM dialect and convert it into LLVM IR(immediate representation). The advantage of converting to LLVM IR is that more detailed instruction scheduling can be carried out at the backend of the compiler. We compare the efficiency of a speech model in the code generated by the traditional compiler clang and the code generated by our compiler. The experimental results show that this conversion method has greatly improved the efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 63.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 79.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimizing Cache Accesses with Tensor Memory Format Search for Transformers in TVM

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Article 06 October 2020

Optimized Code Generation for Deep Neural Networks

References

Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis and transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, ser. CGO 2004. IEEE Computer Society, Washington, DC, p. 75 (2004). http://dl.acm.org/citation.cfm?id=977395.977673
Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst., 13(4), 451–490 (1991). https://doi.org/10.1145/115372.115320
Vasilache, N., et al.: The next 700 accelerated layers: from mathematical expressions of network computation graphs to accelerated GPU kernels, automatically. ACM Trans. Archit. Code Optim. 16(4), pp. 38:1–38:26 (2019). https://doi.org/10.1145/3355606
Lattner, C., et al.: MLIR: scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 2–14 (2021). https://doi.org/10.1109/CGO51591.2021.9370308
Schweitz, E.: An MLIR dialect for high-level optimization of fortran. In: LLVM Developer Meeting, October 2019
Google Scholar
Li, M., et al.: The deep learning compiler: a comprehensive survey. IEEE Trans. Parallel Distrib. Syst. 32(3), 708–727, 1 March 2021. https://doi.org/10.1109/TPDS.2020.3030548
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of 12th USENIX Symposium on Operating Systems Design Implementation, pp. 265–283 (2016)
Google Scholar
Long, G., Yang, J., Zhu, K., Lin, W.: FusionStitching: deep fusion and code generation for tensorflow computations on GPUs (2018). arXiv:1811.05213
Xing, Y., Weng, J., Wang, Y., Sui, L., Shan, Y., Wang, Y.: An in-depth comparison of compilers for deep neural networks on hardware. In: Proceedings of IEEE International Conference on Embedded Software Systems, pp. 1–8 (2019)
Google Scholar
Chen, T., Moreau, T., Jiang, Z., et al.: TVM: an automated end-to-end optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 578–594 (2018)
Google Scholar
Adachi, Y., Kumano, T., Ogino, K.: Intermediate representation for stiff virtual objects. In: Proceedings Virtual Reality Annual International Symposium 1995, pp. 203–210. IEEE (1995)
Google Scholar
Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 937–946 (2009)
Google Scholar
Gopinath, K., Hennessy, J.L.: Copy elimination in functional languages. In: Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 303–314 (1989)
Google Scholar
Wei, R., Schwartz, L., Adve, V.: DLVM: a modern compiler infrastructure for deep learning systems. arXiv preprint arXiv:1711.03016 (2017)
Griewank, A., Walther, A.: Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM (2008)
Google Scholar
Ren, H., Zhang, Z., Jun, W.: SWIFT: A Computationally-intensive DSP architecture for communication applications. Mob. Netw. Appl. 21(6), 974–982 (2016)
Article Google Scholar
Mullapudi, R.T., Vasista, V., Bondhugula, U.: PolyMage: automatic optimization for image processing pipelines. In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 429–443 (2015)
Google Scholar
Zerrell, T., Bruestle, J.: Stripe: tensor compilation via the nested polyhedral model. CoRR, vol. abs/1903.06498 (2019). http://arxiv.org/abs/1903.06498
Zhou, Y., Qin, J., Chen, H., Nunamaker, J.F.: Multilingual web retrieval: an experiment on a multilingual business intelligence portal. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, Big Island, HI, USA, p. 43a (2005)
Google Scholar
Korra, R., Sujatha, P., Chetana, S., Naresh Kumar, M.: Performance evaluation of Multilingual Information Retrieval (MLIR) system over Information Retrieval (IR) system. In: 2011 International Conference on Recent Trends in Information Technology (ICRTIT), Chennai, India, pp. 722–727 (2011)
Google Scholar
Yang, H., Lee, C.: Multilingual Information Retrieval Using GHSOM. In: 2008 Eighth International Conference on Intelligent Systems Design and Applications, Kaohsuing, Taiwan, pp. 225–228 (2008)
Google Scholar
Curzel, S., et al.: Automated generation of integrated digital and spiking neuromorphic machine learning accelerators. In: 2021 IEEE/ACM International Conference on Computer Aided Design (ICCAD), Munich, Germany, pp. 1–7 (2021)
Google Scholar
Tian, R., Guo, L., Li, J., Ren, B., Kestor, G.: A high performance sparse tensor algebra compiler in MLIR. In: 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), St. Louis, MO, USA, pp. 27–38 (2021)
Google Scholar
Wei, W., Zeng, Q., Ye, T., Lomone, D.: Adaptive differentiated integrated routing scheme for GMPLS-based optical Internet. J. Commun. Netw. 6(3), 269–279 (Sept. 2004)
Article Google Scholar
Li, H., Peng, Y.: Effective multi-level image representation for image categorization. In: 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, pp. 1048–1051 (2010)
Google Scholar
Zhuhadar, L., Nasraoui, O., Wyatt, R., Romero, E.: Multi-language ontology-based search engine. In: 2010 Third International Conference on Advances in Computer-Human Interactions, Saint Maarten, Netherlands Antilles, pp. 13–18 (2010)
Google Scholar
Siemieniuk, A., et al.: OCC: an automated end-to-end machine learning optimizing compiler for computing-in-memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(6), 1674–1686 (June2022)
Article Google Scholar
Komisarczyk, K., Chelini, L., Vadivel, K., Jordans, R., Corporaal, H.: PET-to-MLIR: a polyhedral front-end for MLIR. In: 2020 23rd Euromicro Conference on Digital System Design (DSD), Kranj, Slovenia, pp. 551–556 (2020)
Google Scholar
Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-LLVM - software protection for the masses. In: 2015 IEEE/ACM 1st International Workshop on Software Protection, Florence, Italy, pp. 3–9 (2015)
Google Scholar
Wei, J., Thomas, A., Li, G., Pattabiraman, K.: Quantifying the accuracy of high-level fault injection techniques for hardware faults. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, USA, pp. 375–382 (2014)
Google Scholar
Sharma, V.C., Haran, A., Rakamaric, Z., Gopalakrishnan, G.: Towards formal approaches to system resilience. In: 2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing, Vancouver, BC, Canada, pp. 41–50 (2013)
Google Scholar
Alvarez, R., Prabhavalkar, R., Bakhtin, A.: On the efficient representation and execution of deep acoustic models. CoRR, abs/1607.04683 (2016)
Google Scholar

Download references

Acknowledgement

The authors would like to thank the editors and the reviewers for providing comments and suggestions for this paper. This work was supported by National Key R&D Program of China under Grant 2020YFA0711400, National Natural Science Foundation of China under Grants 61831018 and U21A20452, the Jiangxi Double Thousand Plan under Grant jxsq2019201125, and the S&T plan projects of Jiangxi Province Education Department GJJ201003.

Author information

Authors and Affiliations

Department of Computer Science, Tongji University, Shanghai, China
Chenguang Qiu, Haoqi Ren & Zhifeng Zhang
School of Computer Science, Fudan University, Shanghai, China
Jun Wu

Authors

Chenguang Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Haoqi Ren
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenguang Qiu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Feifei Gao
Fudan University, Shanghai, China
Jun Wu
Chongqing University, Chongqing, China
Yun Li
Shanghai University, Shanghai, China
Honghao Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiu, C., Wu, J., Ren, H., Zhang, Z. (2023). Optimization of Tensor Operation in Compiler. In: Gao, F., Wu, J., Li, Y., Gao, H. (eds) Communications and Networking. ChinaCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 500. Springer, Cham. https://doi.org/10.1007/978-3-031-34790-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-34790-0_16
Published: 10 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34789-4
Online ISBN: 978-3-031-34790-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimization of Tensor Operation in Compiler

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing Cache Accesses with Tensor Memory Format Search for Transformers in TVM

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Optimized Code Generation for Deep Neural Networks

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Optimization of Tensor Operation in Compiler

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing Cache Accesses with Tensor Memory Format Search for Transformers in TVM

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Optimized Code Generation for Deep Neural Networks

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation