More Web Proxy on the site http://driver.im/

research-article

Bridging Control-Centric and Data-Centric Optimization

Authors:

Alexandru Calotoiu,

Torsten HoeflerAuthors Info & Claims

CGO 2023: Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization

Pages 173 - 185

https://doi.org/10.1145/3579990.3580018

Published: 22 February 2023 Publication History

Abstract

With the rise of specialized hardware and new programming languages, code optimization has shifted its focus towards promoting data locality. Most production-grade compilers adopt a control-centric mindset --- instruction-driven optimization augmented with scalar-based dataflow --- whereas other approaches provide domain-specific and general purpose data movement minimization, which can miss important control-flow optimizations. As the two representations are not commutable, users must choose one over the other. In this paper, we explore how both control- and data-centric approaches can work in tandem via the Multi-Level Intermediate Representation (MLIR) framework. Through a combination of an MLIR dialect and specialized passes, we recover parametric, symbolic dataflow that can be optimized within the DaCe framework. We combine the two views into a single pipeline, called DCIR, showing that it is strictly more powerful than either view. On several benchmarks and a real-world application in C, we show that our proposed pipeline consistently outperforms MLIR and automatically uncovers new optimization opportunities with no additional effort.

References

[1]

Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11. https://doi.org/10.1109/SC.2012.71

Digital Library

[2]

Tal Ben-Nun, Berke Ates, Alexandru Calotoiu, and Torsten Hoefler. 2023. Data availability artifact. https://doi.org/10.5281/zenodo.7519936

Digital Library

[3]

Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, and Torsten Hoefler. 2019. Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3295500.3356173

Digital Library

[4]

Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer, Thomas Schulthess, and Torsten Hoefler. 2022. Productive Performance Engineering for Weather and Climate Modeling with Python. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’22). IEEE Press. https://doi.org/10.48550/arXiv.2205.04148

Digital Library

[5]

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. https://doi.org/10.48550/arXiv.2004.10934

[6]

Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad, and Torsten Hoefler. 2022. Lifting C Semantics for Dataflow Optimization. In Proceedings of the 36th ACM International Conference on Supercomputing (ICS ’22). Association for Computing Machinery, New York, NY, USA. Article 17, 13 pages. isbn:9781450392815 https://doi.org/10.1145/3524059.3532389

Digital Library

[7]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA. 578–594. isbn:978-1-939133-08-3 https://doi.org/10.48550/arXiv.1802.04799

[8]

Johannes de Fine Licht, Andreas Kuster, Tiziano De Matteis, Tal Ben-Nun, Dominic Hofer, and Torsten Hoefler. 2021. StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems. In Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’21). IEEE Press, 315–326. isbn:9781728186139 https://doi.org/10.1109/CGO51591.2021.9370315

Digital Library

[9]

Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst., 9, 3 (1987), jul, 319–349. issn:0164-0925 https://doi.org/10.1145/24039.24041

Digital Library

[10]

Khronos Group. 2022. SYCL. https://www.khronos.org/sycl/

[11]

Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, and Tobias Grosser. 2021. Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-Accelerated Climate Simulation. ACM Trans. Archit. Code Optim., 18, 4 (2021), Article 51, sep, 23 pages. issn:1544-3566 https://doi.org/10.1145/3469030

Digital Library

[12]

Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. SIGOPS Oper. Syst. Rev., 41, 3 (2007), mar, 59–72. issn:0163-5980 https://doi.org/10.1145/1272998.1273005

Digital Library

[13]

Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. 2021. Data Movement Is All You Need: A Case Study on Optimizing Transformers. In Proceedings of Machine Learning and Systems, A. Smola, A. Dimakis, and I. Stoica (Eds.). 3, 711–732. https://doi.org/10.48550/arXiv.2007.00072

[14]

Navdeep Katel, Vivek Khandelwal, and Uday Bondhugula. 2021. High Performance GPU Code Generation for Matrix-Matrix Multiplication using MLIR: Some Early Results. https://doi.org/10.48550/arXiv.2108.13191 arxiv:2108.13191.

[15]

Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, oct, 29 pages. https://doi.org/10.1145/3133901

Digital Library

[16]

Maria Kotsifakou, Prakalp Srivastava, Matthew D. Sinclair, Rakesh Komuravelli, Vikram Adve, and Sarita Adve. 2018. HPVM: Heterogeneous Parallel Virtual Machine. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA. 68–80. isbn:9781450349826 https://doi.org/10.1145/3178487.3178493

Digital Library

[17]

Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos Ziogas, Maciej Besta, and Torsten Hoefler. 2021. Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’21). Association for Computing Machinery, New York, NY, USA. 328–339. isbn:9781450380706 https://doi.org/10.1145/3409964.3461796

Digital Library

[18]

Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In CGO. San Jose, CA, USA. 75–88. https://doi.org/10.1109/CGO.2004.1281665

[19]

Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 2–14. https://doi.org/10.1109/CGO51591.2021.9370308

Digital Library

[20]

Tim Lindholm and Frank Yellin. 1999. Java Virtual Machine Specification (2nd ed.). Addison-Wesley Longman Publishing Co., Inc., USA. isbn:0201432943

[21]

LLVM Project. 2022. Circuit IR Compilers and Tools (CIRCT). https://github.com/llvm/circt

[22]

LLVM Project. 2022. Flang: FORTRAN front end. https://github.com/llvm/llvm-project/tree/main/flang

[23]

Diganta Misra. 2020. Mish: A Self Regularized Non-Monotonic Activation Function. In Proceedings of the 31st British Machine Vision Conference (BMVC). https://doi.org/10.48550/arXiv.1908.08681

[24]

William S. Moses, Lorenzo Chelini, Ruizhe Zhao, and Oleksandr Zinenko. 2021. Polygeist: Raising C to Polyhedral MLIR. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT ’21). Association for Computing Machinery, New York, NY, USA. 12 pages. https://doi.org/10.1109/PACT52795.2021.00011

Digital Library

[25]

Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A Timely Dataflow System. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). Association for Computing Machinery, New York, NY, USA. 439–455. isbn:9781450323888 https://doi.org/10.1145/2517349.2522738

Digital Library

[26]

NVIDIA Corporation. 2022. cuBLAS: Basic Linear Algebra on NVIDIA GPUs. https://developer.nvidia.com/cublas

[27]

L. N. Pouchet. 2016. PolyBench: The Polyhedral Benchmark suite. https://sourceforge.net/projects/polybench

[28]

Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, and Torsten Hoefler. 2022. A Data-Centric Optimization Framework for Machine Learning. In Proceedings of the 36th ACM International Conference on Supercomputing (ICS ’22). Association for Computing Machinery, New York, NY, USA. Article 36, 13 pages. isbn:9781450392815 https://doi.org/10.1145/3524059.3532364

Digital Library

[29]

Fabian Schuiki, Andreas Kurth, Tobias Grosser, and Luca Benini. 2020. LLHD: A Multi-Level Intermediate Representation for Hardware Description Languages. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 258–271. isbn:9781450376136 https://doi.org/10.1145/3385412.3386024

Digital Library

[30]

Naoki Shibata and Francesco Petrogalli. 2020. SLEEF: A Portable Vectorized Library of C Standard Mathematical Functions. IEEE Transactions on Parallel and Distributed Systems, 31, 6 (2020), 1316–1327. https://doi.org/10.1109/TPDS.2019.2960333

Digital Library

[31]

Alexandre Singer, Frank Gao, and Kai-Ting Amy Wang. 2022. SYCLops: A SYCL Specific LLVM to MLIR Converter. In International Workshop on OpenCL (IWOCL’22). Association for Computing Machinery, New York, NY, USA. Article 13, 8 pages. isbn:9781450396585 https://doi.org/10.1145/3529538.3529992

Digital Library

[32]

Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 157–173. isbn:978-3-642-11261-4

[33]

Didem Unat, Anshu Dubey, Torsten Hoefler, John Shalf, Mark Abraham, Mauro Bianco, Bradford L. Chamberlain, Romain Cledat, H. Carter Edwards, Hal Finkel, Karl Fuerlinger, Frank Hannig, Emmanuel Jeannot, Amir Kamil, Jeff Keasler, Paul H J Kelly, Vitus Leung, Hatem Ltaief, Naoya Maruyama, Chris J. Newburn, and Miquel Pericás. 2017. Trends in Data Locality Abstractions for HPC Systems. IEEE Transactions on Parallel and Distributed Systems, 28, 10 (2017), 3007–3020. https://doi.org/10.1109/TPDS.2017.2703149

Digital Library

[34]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. https://doi.org/10.48550/ARXIV.1802.04730

[35]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A High-Performance Graph Processing Library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’16). Association for Computing Machinery, New York, NY, USA. Article 11, 12 pages. isbn:9781450340922 https://doi.org/10.1145/2851141.2851145

Digital Library

[36]

Jin Zhou and Brian Demsky. 2010. Bamboo: A Data-Centric, Object-Oriented Approach to Many-Core Software. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10). Association for Computing Machinery, New York, NY, USA. 388–399. isbn:9781450300193 https://doi.org/10.1145/1806596.1806640

Digital Library

[37]

Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, and Torsten Hoefler. 2019. A Data-Centric Approach to Extreme-Scale Ab Initio Dissipative Quantum Transport Simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19). Association for Computing Machinery, New York, NY, USA. Article 1, 13 pages. isbn:9781450362290 https://doi.org/10.1145/3295500.3357156

Digital Library

[38]

Alexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, and Torsten Hoefler. 2021. Productivity, Portability, Performance: Data-Centric Python. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA. Article 95, 13 pages. isbn:9781450384421 https://doi.org/10.1145/3458817.3476176

Digital Library

Cited By

Bao GShi HCui CZhang YYao JFilkov VRay BZhou M(2024)UFront: Toward A Unified MLIR Frontend for Deep LearningProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695002(255-267)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695002
Brown NJamieson MLydike ABauer EGrosser T(2023)Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in FlangProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624167(904-913)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624167

Index Terms

Bridging Control-Centric and Data-Centric Optimization
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
  2. Software organization and properties
    1. Extra-functional properties
      1. Software performance
    2. Software functional properties
      1. Correctness

Recommendations

Data-Centric Transformations for Locality Enhancement

On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring compiler community has developed locality-enhancing program ...
Data-centric execution of speculative parallel programs
MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture

Multicore systems must exploit locality to scale, scheduling tasks to minimize data movement. While locality-aware parallelism is well studied in non-speculative systems, it has received little attention in speculative systems (e.g., HTM or TLS), which ...
Frequent value locality and value-centric data cache design
Special Issue: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (ASPLOS '00)

By studying the behavior of programs in the SPECint95 suite we observed that six out of eight programs exhibit a new kind of value locality, the frequent value locality, according to which a few values appear very frequently in memory locations and are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '23: Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization

February 2023

262 pages

ISBN:9798400701016

DOI:10.1145/3579990

General Chair:
Christophe Dubach
McGill University, Canada
,
Program Chairs:
Derek Bruening
Google, USA
,
Ben Hardekopf
University of California at Santa Barbara, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

European Research Council
Horizon 2020 Framework Programme
Swiss National Science Foundation
Platform for Advanced Scientific Computing

Conference

CGO '23

Sponsor:

CGO '23: 21st ACM/IEEE International Symposium on Code Generation and Optimization

February 25 - March 1, 2023

QC, Montréal, Canada

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
661
Total Downloads

Downloads (Last 12 months)221
Downloads (Last 6 weeks)34

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bao GShi HCui CZhang YYao JFilkov VRay BZhou M(2024)UFront: Toward A Unified MLIR Frontend for Deep LearningProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695002(255-267)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695002
Brown NJamieson MLydike ABauer EGrosser T(2023)Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in FlangProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624167(904-913)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624167

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents