Abstract
This paper proposes an AI compiler architecture, which can compile the trained model and deploy it on DSP chip. The biggest difficulty in deploying the reasoning model on DSP is the multiplication between tensors. Tensor multiplication is the main operation and the most time-consuming operation in the process of model reasoning. Therefore, the operation efficiency of tensor multiplication directly restricts the performance of reasoning. However, there is no matrix computing unit in DSP chip, instead of vector computing unit. We define a new dialect in MLIR(Multi-Level Intermediate Representation) to efficiently compile AI models, especially GEMM and conv operations. The dialect is based on the basic features of mhlo, so this new dialect can make full use of the existing optimized pass of mhlo. Moreover, we have added some functions to support architecture related optimization, mainly the lower algorithm of operation, such as GEMM and conv. we finally map dialect to LLVM dialect and convert it into LLVM IR(immediate representation). The advantage of converting to LLVM IR is that more detailed instruction scheduling can be carried out at the backend of the compiler. We compare the efficiency of a speech model in the code generated by the traditional compiler clang and the code generated by our compiler. The experimental results show that this conversion method has greatly improved the efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis and transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, ser. CGO 2004. IEEE Computer Society, Washington, DC, p. 75 (2004). http://dl.acm.org/citation.cfm?id=977395.977673
Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst., 13(4), 451–490 (1991). https://doi.org/10.1145/115372.115320
Vasilache, N., et al.: The next 700 accelerated layers: from mathematical expressions of network computation graphs to accelerated GPU kernels, automatically. ACM Trans. Archit. Code Optim. 16(4), pp. 38:1–38:26 (2019). https://doi.org/10.1145/3355606
Lattner, C., et al.: MLIR: scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 2–14 (2021). https://doi.org/10.1109/CGO51591.2021.9370308
Schweitz, E.: An MLIR dialect for high-level optimization of fortran. In: LLVM Developer Meeting, October 2019
Li, M., et al.: The deep learning compiler: a comprehensive survey. IEEE Trans. Parallel Distrib. Syst. 32(3), 708–727, 1 March 2021. https://doi.org/10.1109/TPDS.2020.3030548
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of 12th USENIX Symposium on Operating Systems Design Implementation, pp. 265–283 (2016)
Long, G., Yang, J., Zhu, K., Lin, W.: FusionStitching: deep fusion and code generation for tensorflow computations on GPUs (2018). arXiv:1811.05213
Xing, Y., Weng, J., Wang, Y., Sui, L., Shan, Y., Wang, Y.: An in-depth comparison of compilers for deep neural networks on hardware. In: Proceedings of IEEE International Conference on Embedded Software Systems, pp. 1–8 (2019)
Chen, T., Moreau, T., Jiang, Z., et al.: TVM: an automated end-to-end optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 578–594 (2018)
Adachi, Y., Kumano, T., Ogino, K.: Intermediate representation for stiff virtual objects. In: Proceedings Virtual Reality Annual International Symposium 1995, pp. 203–210. IEEE (1995)
Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 937–946 (2009)
Gopinath, K., Hennessy, J.L.: Copy elimination in functional languages. In: Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 303–314 (1989)
Wei, R., Schwartz, L., Adve, V.: DLVM: a modern compiler infrastructure for deep learning systems. arXiv preprint arXiv:1711.03016 (2017)
Griewank, A., Walther, A.: Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM (2008)
Ren, H., Zhang, Z., Jun, W.: SWIFT: A Computationally-intensive DSP architecture for communication applications. Mob. Netw. Appl. 21(6), 974–982 (2016)
Mullapudi, R.T., Vasista, V., Bondhugula, U.: PolyMage: automatic optimization for image processing pipelines. In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 429–443 (2015)
Zerrell, T., Bruestle, J.: Stripe: tensor compilation via the nested polyhedral model. CoRR, vol. abs/1903.06498 (2019). http://arxiv.org/abs/1903.06498
Zhou, Y., Qin, J., Chen, H., Nunamaker, J.F.: Multilingual web retrieval: an experiment on a multilingual business intelligence portal. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, Big Island, HI, USA, p. 43a (2005)
Korra, R., Sujatha, P., Chetana, S., Naresh Kumar, M.: Performance evaluation of Multilingual Information Retrieval (MLIR) system over Information Retrieval (IR) system. In: 2011 International Conference on Recent Trends in Information Technology (ICRTIT), Chennai, India, pp. 722–727 (2011)
Yang, H., Lee, C.: Multilingual Information Retrieval Using GHSOM. In: 2008 Eighth International Conference on Intelligent Systems Design and Applications, Kaohsuing, Taiwan, pp. 225–228 (2008)
Curzel, S., et al.: Automated generation of integrated digital and spiking neuromorphic machine learning accelerators. In: 2021 IEEE/ACM International Conference on Computer Aided Design (ICCAD), Munich, Germany, pp. 1–7 (2021)
Tian, R., Guo, L., Li, J., Ren, B., Kestor, G.: A high performance sparse tensor algebra compiler in MLIR. In: 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), St. Louis, MO, USA, pp. 27–38 (2021)
Wei, W., Zeng, Q., Ye, T., Lomone, D.: Adaptive differentiated integrated routing scheme for GMPLS-based optical Internet. J. Commun. Netw. 6(3), 269–279 (Sept. 2004)
Li, H., Peng, Y.: Effective multi-level image representation for image categorization. In: 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, pp. 1048–1051 (2010)
Zhuhadar, L., Nasraoui, O., Wyatt, R., Romero, E.: Multi-language ontology-based search engine. In: 2010 Third International Conference on Advances in Computer-Human Interactions, Saint Maarten, Netherlands Antilles, pp. 13–18 (2010)
Siemieniuk, A., et al.: OCC: an automated end-to-end machine learning optimizing compiler for computing-in-memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(6), 1674–1686 (June2022)
Komisarczyk, K., Chelini, L., Vadivel, K., Jordans, R., Corporaal, H.: PET-to-MLIR: a polyhedral front-end for MLIR. In: 2020 23rd Euromicro Conference on Digital System Design (DSD), Kranj, Slovenia, pp. 551–556 (2020)
Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-LLVM - software protection for the masses. In: 2015 IEEE/ACM 1st International Workshop on Software Protection, Florence, Italy, pp. 3–9 (2015)
Wei, J., Thomas, A., Li, G., Pattabiraman, K.: Quantifying the accuracy of high-level fault injection techniques for hardware faults. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, USA, pp. 375–382 (2014)
Sharma, V.C., Haran, A., Rakamaric, Z., Gopalakrishnan, G.: Towards formal approaches to system resilience. In: 2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing, Vancouver, BC, Canada, pp. 41–50 (2013)
Alvarez, R., Prabhavalkar, R., Bakhtin, A.: On the efficient representation and execution of deep acoustic models. CoRR, abs/1607.04683 (2016)
Acknowledgement
The authors would like to thank the editors and the reviewers for providing comments and suggestions for this paper. This work was supported by National Key R&D Program of China under Grant 2020YFA0711400, National Natural Science Foundation of China under Grants 61831018 and U21A20452, the Jiangxi Double Thousand Plan under Grant jxsq2019201125, and the S&T plan projects of Jiangxi Province Education Department GJJ201003.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Qiu, C., Wu, J., Ren, H., Zhang, Z. (2023). Optimization of Tensor Operation in Compiler. In: Gao, F., Wu, J., Li, Y., Gao, H. (eds) Communications and Networking. ChinaCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 500. Springer, Cham. https://doi.org/10.1007/978-3-031-34790-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-34790-0_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34789-4
Online ISBN: 978-3-031-34790-0
eBook Packages: Computer ScienceComputer Science (R0)