More Web Proxy on the site http://driver.im/

research-article

CustomHalide – A new plugin of clang for loop optimization

Authors:

Chenyang YueAuthors Info & Claims

ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial Intelligence

Pages 557 - 567

https://doi.org/10.1145/3594315.3594372

Published: 02 August 2023 Publication History

Abstract

Nowadays, Polyhedral and Halide techniques optimize complex nested loops in deep learning frameworks’ basic operator compilation process. However, faced with many algorithm platforms, a single framework cannot be used for deployment purposes. In some scenarios, the complex deployment environment of the framework is also an important consideration. At this point, users are more interested in whether there is an optimization tool chain that automatically optimizes the simple described algorithm directly to solve the needs of customized deployment while bringing performance benefits. This paper presents an automatic optimization toolchain CustomHalide based on Halide automatic scheduling algorithm and Clang AST. The core of CustomHaide is the tool Loop-convert, which realizes loop parameters’ extraction from source code, converting parameters, calling the Halide auto-scheduler, integration and compilation of optimized code. Loop-convert provides a fully automated flow from C/C++ source code to optimized programs. After testing, CustomHalide has a better effect on identifying and extracting source code than Polly. At the same time, the average performance of CustomHalide is 3% higher than that of Halide both on CPU and GPU in the case of known bound information. The average compilation time of CustomHalide is one third of TVM with the same end-to-end performance. Finally, through the bound setting strategy, CustomHalide supports loop optimization with unknown bound information, and its average performance is about 50% higher than that of the program optimized by clang-O3.

References

[1]

A. Adams, K. Ma, L. Anderson, R. Baghdadi, T. M. Li, M. Gharbi, B. Steiner, S. Johnson, K. Fatahalian, and F. Durand. 2019. Learning to optimize halide with tree search and random programs. ACMPUB27New York, NY, USA4 (2019).

[2]

C. Bastoul.2004. Code generation in the polyhedral model is easier than you think. In: Proc. of the 13th Int’l Conf. on Parallel Architectures and Compilation Techniques (PACT) (2004), 7–16.

Digital Library

[3]

M. W. Benabderrahmane, L. N. Pouchet, A. Cohen, and C. Bastoul. 2010. The polyhedral model is more widely applicable than you think. In Compiler Construction.

[4]

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. Acm Sigplan Notices 43, 6 (2008), 101–113.

Digital Library

[5]

T. Chen, T. Moreau, Z. Jiang, L. Zheng, and E. Yan. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. (2018).

[6]

GROSSER, TOBIAS, VERDOOLAEGE, SVEN, COHEN, and ALBERT. 2015. Polyhedral AST Generation Is More Than Scanning Polyhedra.ACM Transactions on Programming Languages Systems (2015).

[7]

Tobias Grosser, Albert Cohen, Justin Holewinski, Ponnuswamy Sadayappan, and Sven Verdoolaege. 2014. Hybrid Hexagonal/Classical Tiling for GPUs. In Intl. Symp. on Code Generation and Optimization (CGO). Orlando, FL, United States. https://hal.inria.fr/hal-00911177

[8]

T. Grosser H. Zheng R. Aloor A. Simburger A. Grosslinger and L.-N. Pouchet.2011. Polly-polyhedral optimization in LLVM. Proc. 1st Int. Workshop Polyhedral Compilation Techn. (IMPACT) (2011).

[9]

C. Lattner and V. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis Transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on.

[10]

Tzu Mao Li, Michael Gharbi, A. Adams, Fredo Durand, and J. Ragan-Kelley. 2018. Differentiable programming for image processing and deep learning in halide. ACM Transactions on Graphics 37, 4CD (2018), 1–13.

Digital Library

[11]

A. W. Lim, S. W. Liao, and M. S. Lam. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. Acm Sigplan Notices 36, 7 (2001), 103–112.

Digital Library

[12]

Massinissa Merouani, Mohamed Hicham Leghettas, Riyadh Baghdadi, Taha Arbaoui, and Karima Benatchba. 2020. A Deep Learning Based Cost Model for Automatic Code Optimization in Tiramisu. (2020).

[13]

R. T. Mullapudi, A. Adams, D. Sharlet, J. Ragan-Kelley, and K. Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics (TOG) 35, 4 (2016), 83.

Digital Library

[14]

Pugh and William. 1991. The Omega test: a fast and practical integer programming algorithm for dependence analysis. University of Maryland at College Park (1991), 4–13.

[15]

J. Ragan-Kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe, and F. Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings 31, 4 (2012), 13–15.

[16]

J. M. Ragan-Kelley. 2014. Decoupling algorithms from the organization of computation for high performance image processing. Massachusetts Institute of Technology (2014).

[17]

J. M. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, and S. P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. acm sigplan notices (2013).

Digital Library

[18]

S. Sioutas, S. Stuijk, T. Basten, H. Corporaal, and S. Lou. 2020. Schedule Synthesis for Halide Pipelines on GPUs. ACM Transactions on Architecture and Code Optimization 17, 3 (2020), 1–25.

Digital Library

[19]

N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. Devito, W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. (2018).

[20]

S. Verdoolaege, J. C. Juega, A. Cohen, JI Gómez, C. Tenllado, and F. Catthoor. 2013. Polyhedral parallel code generation for CUDA. In High Performance Embedded Architectures and Compilers.

[21]

Vinay, Vasista, Ravi, Teja, Mullapudi, Uday, and Bondhugula. 2015. PolyMage: Automatic Optimization for Image Processing Pipelines. Computer architecture news 43, 1 (2015), 429–443.

[22]

Kohei Fujita Lalith Wijerathne Muneo Hori Wataru Sakurai, Tsuyoshi Ichimura. 2022. Fast Data-Centric Optimization of Nonlinear Dynamic Flows on Network System Suited for Big-Data and Extreme Computing. Journal of Advances in Information Technology 13, 2 (2022), 186–191.

[23]

Hongqin Chi Youyun Ao. 2010. Dynamic Differential Evolution for Constrained Real-Parameter Optimization. Journal of Advances in Information Technology 1, 1 (2010), 43–51.

[24]

L. Zheng, C. Jia, M. Sun, Z. Wu, and I. Stoica. 2020. Ansor : Generating High-Performance Tensor Programs for Deep Learning. (2020).

[25]

S. Zheng, Y. Liang, S. Wang, R. Chen, and K. Sheng. 2020. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In ASPLOS ’20: Architectural Support for Programming Languages and Operating Systems.

[26]

Y. Zhou, S. Roy, A. Abdolrashidi, D. Wong, P. Ma, Q. Xu, H. Liu, P. M. Phothilimthana, S. Wang, and A. Goldie. 2020. Transferable Graph Optimizers for ML Compilers.

Index Terms

CustomHalide – A new plugin of clang for loop optimization
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Leveraging GPUs using cooperative loop speculation

Graphics processing units, or GPUs, provide TFLOPs of additional performance potential in commodity computer systems that frequently go unused by most applications. Even with the emergence of languages such as CUDA and OpenCL, programming GPUs remains a ...
A Halide-based Synergistic Computing Framework for Heterogeneous Systems

New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for ...
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights
- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
Abstract
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial Intelligence

March 2023

824 pages

ISBN:9781450399029

DOI:10.1145/3594315

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCAI 2023

ICCAI 2023: 2023 9th International Conference on Computing and Artificial Intelligence

March 17 - 20, 2023

Tianjin, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
49
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents