[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3579990.3580018acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Bridging Control-Centric and Data-Centric Optimization

Published: 22 February 2023 Publication History

Abstract

With the rise of specialized hardware and new programming languages, code optimization has shifted its focus towards promoting data locality. Most production-grade compilers adopt a control-centric mindset --- instruction-driven optimization augmented with scalar-based dataflow --- whereas other approaches provide domain-specific and general purpose data movement minimization, which can miss important control-flow optimizations. As the two representations are not commutable, users must choose one over the other. In this paper, we explore how both control- and data-centric approaches can work in tandem via the Multi-Level Intermediate Representation (MLIR) framework. Through a combination of an MLIR dialect and specialized passes, we recover parametric, symbolic dataflow that can be optimized within the DaCe framework. We combine the two views into a single pipeline, called DCIR, showing that it is strictly more powerful than either view. On several benchmarks and a real-world application in C, we show that our proposed pipeline consistently outperforms MLIR and automatically uncovers new optimization opportunities with no additional effort.

References

[1]
Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11. https://doi.org/10.1109/SC.2012.71
[2]
Tal Ben-Nun, Berke Ates, Alexandru Calotoiu, and Torsten Hoefler. 2023. Data availability artifact. https://doi.org/10.5281/zenodo.7519936
[3]
Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, and Torsten Hoefler. 2019. Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3295500.3356173
[4]
Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer, Thomas Schulthess, and Torsten Hoefler. 2022. Productive Performance Engineering for Weather and Climate Modeling with Python. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’22). IEEE Press. https://doi.org/10.48550/arXiv.2205.04148
[5]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. https://doi.org/10.48550/arXiv.2004.10934
[6]
Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad, and Torsten Hoefler. 2022. Lifting C Semantics for Dataflow Optimization. In Proceedings of the 36th ACM International Conference on Supercomputing (ICS ’22). Association for Computing Machinery, New York, NY, USA. Article 17, 13 pages. isbn:9781450392815 https://doi.org/10.1145/3524059.3532389
[7]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA. 578–594. isbn:978-1-939133-08-3 https://doi.org/10.48550/arXiv.1802.04799
[8]
Johannes de Fine Licht, Andreas Kuster, Tiziano De Matteis, Tal Ben-Nun, Dominic Hofer, and Torsten Hoefler. 2021. StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems. In Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’21). IEEE Press, 315–326. isbn:9781728186139 https://doi.org/10.1109/CGO51591.2021.9370315
[9]
Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst., 9, 3 (1987), jul, 319–349. issn:0164-0925 https://doi.org/10.1145/24039.24041
[10]
Khronos Group. 2022. SYCL. https://www.khronos.org/sycl/
[11]
Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, and Tobias Grosser. 2021. Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-Accelerated Climate Simulation. ACM Trans. Archit. Code Optim., 18, 4 (2021), Article 51, sep, 23 pages. issn:1544-3566 https://doi.org/10.1145/3469030
[12]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. SIGOPS Oper. Syst. Rev., 41, 3 (2007), mar, 59–72. issn:0163-5980 https://doi.org/10.1145/1272998.1273005
[13]
Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. 2021. Data Movement Is All You Need: A Case Study on Optimizing Transformers. In Proceedings of Machine Learning and Systems, A. Smola, A. Dimakis, and I. Stoica (Eds.). 3, 711–732. https://doi.org/10.48550/arXiv.2007.00072
[14]
Navdeep Katel, Vivek Khandelwal, and Uday Bondhugula. 2021. High Performance GPU Code Generation for Matrix-Matrix Multiplication using MLIR: Some Early Results. https://doi.org/10.48550/arXiv.2108.13191 arxiv:2108.13191.
[15]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, oct, 29 pages. https://doi.org/10.1145/3133901
[16]
Maria Kotsifakou, Prakalp Srivastava, Matthew D. Sinclair, Rakesh Komuravelli, Vikram Adve, and Sarita Adve. 2018. HPVM: Heterogeneous Parallel Virtual Machine. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA. 68–80. isbn:9781450349826 https://doi.org/10.1145/3178487.3178493
[17]
Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos Ziogas, Maciej Besta, and Torsten Hoefler. 2021. Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’21). Association for Computing Machinery, New York, NY, USA. 328–339. isbn:9781450380706 https://doi.org/10.1145/3409964.3461796
[18]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In CGO. San Jose, CA, USA. 75–88. https://doi.org/10.1109/CGO.2004.1281665
[19]
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 2–14. https://doi.org/10.1109/CGO51591.2021.9370308
[20]
Tim Lindholm and Frank Yellin. 1999. Java Virtual Machine Specification (2nd ed.). Addison-Wesley Longman Publishing Co., Inc., USA. isbn:0201432943
[21]
LLVM Project. 2022. Circuit IR Compilers and Tools (CIRCT). https://github.com/llvm/circt
[22]
LLVM Project. 2022. Flang: FORTRAN front end. https://github.com/llvm/llvm-project/tree/main/flang
[23]
Diganta Misra. 2020. Mish: A Self Regularized Non-Monotonic Activation Function. In Proceedings of the 31st British Machine Vision Conference (BMVC). https://doi.org/10.48550/arXiv.1908.08681
[24]
William S. Moses, Lorenzo Chelini, Ruizhe Zhao, and Oleksandr Zinenko. 2021. Polygeist: Raising C to Polyhedral MLIR. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT ’21). Association for Computing Machinery, New York, NY, USA. 12 pages. https://doi.org/10.1109/PACT52795.2021.00011
[25]
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A Timely Dataflow System. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). Association for Computing Machinery, New York, NY, USA. 439–455. isbn:9781450323888 https://doi.org/10.1145/2517349.2522738
[26]
NVIDIA Corporation. 2022. cuBLAS: Basic Linear Algebra on NVIDIA GPUs. https://developer.nvidia.com/cublas
[27]
L. N. Pouchet. 2016. PolyBench: The Polyhedral Benchmark suite. https://sourceforge.net/projects/polybench
[28]
Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, and Torsten Hoefler. 2022. A Data-Centric Optimization Framework for Machine Learning. In Proceedings of the 36th ACM International Conference on Supercomputing (ICS ’22). Association for Computing Machinery, New York, NY, USA. Article 36, 13 pages. isbn:9781450392815 https://doi.org/10.1145/3524059.3532364
[29]
Fabian Schuiki, Andreas Kurth, Tobias Grosser, and Luca Benini. 2020. LLHD: A Multi-Level Intermediate Representation for Hardware Description Languages. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 258–271. isbn:9781450376136 https://doi.org/10.1145/3385412.3386024
[30]
Naoki Shibata and Francesco Petrogalli. 2020. SLEEF: A Portable Vectorized Library of C Standard Mathematical Functions. IEEE Transactions on Parallel and Distributed Systems, 31, 6 (2020), 1316–1327. https://doi.org/10.1109/TPDS.2019.2960333
[31]
Alexandre Singer, Frank Gao, and Kai-Ting Amy Wang. 2022. SYCLops: A SYCL Specific LLVM to MLIR Converter. In International Workshop on OpenCL (IWOCL’22). Association for Computing Machinery, New York, NY, USA. Article 13, 8 pages. isbn:9781450396585 https://doi.org/10.1145/3529538.3529992
[32]
Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 157–173. isbn:978-3-642-11261-4
[33]
Didem Unat, Anshu Dubey, Torsten Hoefler, John Shalf, Mark Abraham, Mauro Bianco, Bradford L. Chamberlain, Romain Cledat, H. Carter Edwards, Hal Finkel, Karl Fuerlinger, Frank Hannig, Emmanuel Jeannot, Amir Kamil, Jeff Keasler, Paul H J Kelly, Vitus Leung, Hatem Ltaief, Naoya Maruyama, Chris J. Newburn, and Miquel Pericás. 2017. Trends in Data Locality Abstractions for HPC Systems. IEEE Transactions on Parallel and Distributed Systems, 28, 10 (2017), 3007–3020. https://doi.org/10.1109/TPDS.2017.2703149
[34]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. https://doi.org/10.48550/ARXIV.1802.04730
[35]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A High-Performance Graph Processing Library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’16). Association for Computing Machinery, New York, NY, USA. Article 11, 12 pages. isbn:9781450340922 https://doi.org/10.1145/2851141.2851145
[36]
Jin Zhou and Brian Demsky. 2010. Bamboo: A Data-Centric, Object-Oriented Approach to Many-Core Software. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10). Association for Computing Machinery, New York, NY, USA. 388–399. isbn:9781450300193 https://doi.org/10.1145/1806596.1806640
[37]
Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, and Torsten Hoefler. 2019. A Data-Centric Approach to Extreme-Scale Ab Initio Dissipative Quantum Transport Simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19). Association for Computing Machinery, New York, NY, USA. Article 1, 13 pages. isbn:9781450362290 https://doi.org/10.1145/3295500.3357156
[38]
Alexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, and Torsten Hoefler. 2021. Productivity, Portability, Performance: Data-Centric Python. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA. Article 95, 13 pages. isbn:9781450384421 https://doi.org/10.1145/3458817.3476176

Cited By

View all
  • (2024)UFront: Toward A Unified MLIR Frontend for Deep LearningProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695002(255-267)Online publication date: 27-Oct-2024
  • (2023)Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in FlangProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624167(904-913)Online publication date: 12-Nov-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '23: Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization
February 2023
262 pages
ISBN:9798400701016
DOI:10.1145/3579990
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. DaCe
  2. MLIR
  3. data-centric programming

Qualifiers

  • Research-article

Funding Sources

Conference

CGO '23

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)221
  • Downloads (Last 6 weeks)34
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)UFront: Toward A Unified MLIR Frontend for Deep LearningProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695002(255-267)Online publication date: 27-Oct-2024
  • (2023)Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in FlangProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624167(904-913)Online publication date: 12-Nov-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media