[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3594315.3594372acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaiConference Proceedingsconference-collections
research-article

CustomHalide – A new plugin of clang for loop optimization

Published: 02 August 2023 Publication History

Abstract

Nowadays, Polyhedral and Halide techniques optimize complex nested loops in deep learning frameworks’ basic operator compilation process. However, faced with many algorithm platforms, a single framework cannot be used for deployment purposes. In some scenarios, the complex deployment environment of the framework is also an important consideration. At this point, users are more interested in whether there is an optimization tool chain that automatically optimizes the simple described algorithm directly to solve the needs of customized deployment while bringing performance benefits. This paper presents an automatic optimization toolchain CustomHalide based on Halide automatic scheduling algorithm and Clang AST. The core of CustomHaide is the tool Loop-convert, which realizes loop parameters’ extraction from source code, converting parameters, calling the Halide auto-scheduler, integration and compilation of optimized code. Loop-convert provides a fully automated flow from C/C++ source code to optimized programs. After testing, CustomHalide has a better effect on identifying and extracting source code than Polly. At the same time, the average performance of CustomHalide is 3% higher than that of Halide both on CPU and GPU in the case of known bound information. The average compilation time of CustomHalide is one third of TVM with the same end-to-end performance. Finally, through the bound setting strategy, CustomHalide supports loop optimization with unknown bound information, and its average performance is about 50% higher than that of the program optimized by clang-O3.

References

[1]
A. Adams, K. Ma, L. Anderson, R. Baghdadi, T. M. Li, M. Gharbi, B. Steiner, S. Johnson, K. Fatahalian, and F. Durand. 2019. Learning to optimize halide with tree search and random programs. ACMPUB27New York, NY, USA4 (2019).
[2]
C. Bastoul.2004. Code generation in the polyhedral model is easier than you think. In: Proc. of the 13th Int’l Conf. on Parallel Architectures and Compilation Techniques (PACT) (2004), 7–16.
[3]
M. W. Benabderrahmane, L. N. Pouchet, A. Cohen, and C. Bastoul. 2010. The polyhedral model is more widely applicable than you think. In Compiler Construction.
[4]
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. Acm Sigplan Notices 43, 6 (2008), 101–113.
[5]
T. Chen, T. Moreau, Z. Jiang, L. Zheng, and E. Yan. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. (2018).
[6]
GROSSER, TOBIAS, VERDOOLAEGE, SVEN, COHEN, and ALBERT. 2015. Polyhedral AST Generation Is More Than Scanning Polyhedra.ACM Transactions on Programming Languages Systems (2015).
[7]
Tobias Grosser, Albert Cohen, Justin Holewinski, Ponnuswamy Sadayappan, and Sven Verdoolaege. 2014. Hybrid Hexagonal/Classical Tiling for GPUs. In Intl. Symp. on Code Generation and Optimization (CGO). Orlando, FL, United States. https://hal.inria.fr/hal-00911177
[8]
T. Grosser H. Zheng R. Aloor A. Simburger A. Grosslinger and L.-N. Pouchet.2011. Polly-polyhedral optimization in LLVM. Proc. 1st Int. Workshop Polyhedral Compilation Techn. (IMPACT) (2011).
[9]
C. Lattner and V. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis Transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on.
[10]
Tzu Mao Li, Michael Gharbi, A. Adams, Fredo Durand, and J. Ragan-Kelley. 2018. Differentiable programming for image processing and deep learning in halide. ACM Transactions on Graphics 37, 4CD (2018), 1–13.
[11]
A. W. Lim, S. W. Liao, and M. S. Lam. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. Acm Sigplan Notices 36, 7 (2001), 103–112.
[12]
Massinissa Merouani, Mohamed Hicham Leghettas, Riyadh Baghdadi, Taha Arbaoui, and Karima Benatchba. 2020. A Deep Learning Based Cost Model for Automatic Code Optimization in Tiramisu. (2020).
[13]
R. T. Mullapudi, A. Adams, D. Sharlet, J. Ragan-Kelley, and K. Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics (TOG) 35, 4 (2016), 83.
[14]
Pugh and William. 1991. The Omega test: a fast and practical integer programming algorithm for dependence analysis. University of Maryland at College Park (1991), 4–13.
[15]
J. Ragan-Kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe, and F. Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings 31, 4 (2012), 13–15.
[16]
J. M. Ragan-Kelley. 2014. Decoupling algorithms from the organization of computation for high performance image processing. Massachusetts Institute of Technology (2014).
[17]
J. M. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, and S. P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. acm sigplan notices (2013).
[18]
S. Sioutas, S. Stuijk, T. Basten, H. Corporaal, and S. Lou. 2020. Schedule Synthesis for Halide Pipelines on GPUs. ACM Transactions on Architecture and Code Optimization 17, 3 (2020), 1–25.
[19]
N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. Devito, W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. (2018).
[20]
S. Verdoolaege, J. C. Juega, A. Cohen, JI Gómez, C. Tenllado, and F. Catthoor. 2013. Polyhedral parallel code generation for CUDA. In High Performance Embedded Architectures and Compilers.
[21]
Vinay, Vasista, Ravi, Teja, Mullapudi, Uday, and Bondhugula. 2015. PolyMage: Automatic Optimization for Image Processing Pipelines. Computer architecture news 43, 1 (2015), 429–443.
[22]
Kohei Fujita Lalith Wijerathne Muneo Hori Wataru Sakurai, Tsuyoshi Ichimura. 2022. Fast Data-Centric Optimization of Nonlinear Dynamic Flows on Network System Suited for Big-Data and Extreme Computing. Journal of Advances in Information Technology 13, 2 (2022), 186–191.
[23]
Hongqin Chi Youyun Ao. 2010. Dynamic Differential Evolution for Constrained Real-Parameter Optimization. Journal of Advances in Information Technology 1, 1 (2010), 43–51.
[24]
L. Zheng, C. Jia, M. Sun, Z. Wu, and I. Stoica. 2020. Ansor : Generating High-Performance Tensor Programs for Deep Learning. (2020).
[25]
S. Zheng, Y. Liang, S. Wang, R. Chen, and K. Sheng. 2020. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In ASPLOS ’20: Architectural Support for Programming Languages and Operating Systems.
[26]
Y. Zhou, S. Roy, A. Abdolrashidi, D. Wong, P. Ma, Q. Xu, H. Liu, P. M. Phothilimthana, S. Wang, and A. Goldie. 2020. Transferable Graph Optimizers for ML Compilers.

Index Terms

  1. CustomHalide – A new plugin of clang for loop optimization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial Intelligence
    March 2023
    824 pages
    ISBN:9781450399029
    DOI:10.1145/3594315
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 August 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Clang AST
    2. Halide
    3. auto-scheduler
    4. compiler

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICCAI 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 49
      Total Downloads
    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media