[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable

Published: 17 December 2019 Publication History

Abstract

The polyhedral model has been successfully used in production compilers. Nevertheless, only a very restricted class of applications can benefit from it. Recent proposals investigated how runtime information could be used to apply polyhedral optimization on applications that do not statically fit the model. In this work, we go one step further in that direction. We propose the folding-based analysis that, from the output of an instrumented program execution, builds a compact polyhedral representation. It is able to accurately detect affine dependencies, fixed-stride memory accesses, and induction variables in programs. It scales to real-life applications, which often include some nonaffine dependencies and accesses in otherwise affine code. This is enabled by a safe fine-grained polyhedral overapproximation mechanism. We evaluate our analysis on the entire Rodinia benchmark suite, enabling accurate feedback about the potential for complex polyhedral transformations.

References

[1]
Péricles Alves, Fabian Gruber, Johannes Doerfert, Alexandros Lamprineas, Tobias Grosser, Fabrice Rastello, and Fernando Magno Quintão Pereira. 2015. Runtime pointer disambiguation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’15). ACM.
[2]
Ran Ao, Guangming Tan, and Mingyu Chen. 2013. ParaInsight: An assistant for quantitatively analyzing multi-granularity parallel region. In 2013 IEEE 10th International Conference on High Performance Computing and Communications 8 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC’13). IEEE.
[3]
Cédric Bastoul. 2004. Generating loops for scanning polyhedra: Cloog users guide. Polyhedron 2 (2004).
[4]
Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC’05).
[5]
Erik Berg and Erik Hagersten. 2005. Fast data-locality profiling of native execution. In ACM SIGMETRICS Performance Evaluation Review. ACM.
[6]
Kristof Beyls and Erik D’Hollander. 2006. Discovery of locality-improving refactorings by reuse path analysis. High Performance Computing and Communications 4208 (2006), 220--229.
[7]
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). ACM.
[8]
G. S. Brodal and R. Jacob. 2002. Dynamic planar convex hull. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002, Proceedings.
[9]
Khansa Butt, Abdul Qadeer, Ghulam Mustafa, and Abdul Waheed. 2012. Runtime analysis of application binaries for function level parallelism potential using QEMU. In 2012 International Conference on Open Source Systems and Technologies (ICOSST’12). IEEE.
[10]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization, 2009 (IISWC’09).
[11]
Shuai Che, Jeremy W. Sheaffer, Michael Boyer, Lukasz G. Szafaryn, Liang Wang, and Kevin Skadron. 2010. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’10). IEEE Computer Society.
[12]
Jean-François Collard, Denis Barthou, and Paul Feautrier. 1995. Fuzzy array dataflow analysis. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’95). ACM.
[13]
Johannes Doerfert, Tobias Grosser, and Sebastian Hack. 2017. Optimistic loop optimization. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO’17). IEEE Press.
[14]
Karl-Filip Faxén, Konstantin Popov, Sverker Jansson, and Lars Albertsson. 2008. Embla - data dependence profiling for parallel programming. In Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS’08). IEEE Computer Society.
[15]
Paul Feautrier. 1988. Parametric integer programming. RAIRO-Operations Research 22, 3 (1988), 243--268.
[16]
Paul Feautrier and Christian Lengauer. 2011. Polyhedron model. In Encyclopedia of Parallel Computing. Springer.
[17]
Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters 22, 4 (2012). https://www.worldscientific.com/doi/10.1142/S0129626412500107.
[18]
Fabian Gruber, Manuel Selva, Diogo Sampaio, Christophe Guillon, Antoine Moynault, Louis-Noël Pouchet, and Fabrice Rastello. Python implementation of the folding based analysis. Retrieved from https://gitlab.inria.fr/fgruber/python-folding.
[19]
Fabian Gruber, Manuel Selva, Diogo Sampaio, Christophe Guillon, Antoine Moynault, Louis-Noël Pouchet, and Fabrice Rastello. 2019. Data-flow/dependence profiling for structured transformations. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’19).
[20]
Fabian Gruber, Manuel Selva, Diogo Sampaio, Christophe Guillon, Louis-Noël Pouchet, and Fabrice Rastello. 2019. Building of a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Non-Affine Programs Scalable. Research Report RR-9244. Retrieved from https://hal.inria.fr/hal-01967828.
[21]
Christophe Guillon. 2011. Program instrumentation with QEMU. In Proceedings of the International QEMU User’s Forum (QUF’11).
[22]
John Hershberger and Subhash Suri. 2003. Convex hulls and related problems in data streams. In Proceedings of the ACM/DIMACS Workshop on Management and Processing of Data Streams.
[23]
Justin Holewinski, Ragavendar Ramamurthi, Mahesh Ravishankar, Naznin Fauzia, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2012. Dynamic trace-based analysis of vectorization potential of applications. ACM SIGPLAN Notices 47, 6 (2012).
[24]
Alain Ketterlin and Philippe Clauss. 2008. Prediction and trace compression of data access addresses through nested loop recognition. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’08). ACM.
[25]
Alain Ketterlin and Philippe Clauss. 2012. Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society.
[26]
Minjang Kim, Hyesoon Kim, and Chi-Keung Luk. 2010. Prospector: A dynamic data-dependence profiler to help parallel programming. In HotParâ10: Proceedings of the USENIX Workshop on Hot Topics in Parallelism.
[27]
Zhen Li, Rohit Atre, Zia Ul-Huda, Ali Jannesari, and Felix Wolf. 2015. DiscoPoP: A profiling tool to identify parallelization opportunities. In Tools for High Performance Computing 2014. Springer.
[28]
Xu Liu and John Mellor-Crummey. 2011. Pinpointing data locality problems using data-centric analysis. In 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE.
[29]
G. Marin, J. Dongarra, and D. Terpstra. 2014. MIAMI: A framework for application performance diagnosis. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14).
[30]
Juan Manuel Martinez Caamaño, Manuel Selva, Philippe Clauss, Artyom Baloian, and Willy Wolff. 2017. Full runtime polyhedral optimizing loop transformations with the generation, instantiation, and scheduling of code-bones. Concurrency and Computation: Practice and Experience 29, 15 (2017). e4192 cpe.4192.
[31]
Nicholas Nethercote and Alan Mycroft. 2003. Redux: A dynamic dataflow tracer. Electronic Notes in Theoretical Computer Science 89, 2 (2003), 149--170.
[32]
Catherine Mills Olschanowsky, Mustafa M. Tikir, Laura Carrington, and Allan Snavely. 2010. PSnAP: Accurate synthetic address streams through memory profiles. In Languages and Compilers for Parallel Computing, Guang R. Gao, Lori L. Pollock, John Cavazos, and Xiaoming Li (Eds.). Springer, Berlin.
[33]
Sebastian Pop, Albert Cohen, and Georges-André Silber. 2005. Induction variable analysis with delayed abstractions. In Proceedings of the 1st International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’05).
[34]
Louis-Noël Pouchet. 2019. The PoCC polyhedral compiler collection. Retrieved from http://pocc.sourceforge.net.
[35]
Gabriel Rodríguez, José M. Andión, Mahmut T. Kandemir, and Juan Touriño. 2016. Trace-based affine reconstruction of codes. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO’16). ACM.
[36]
Silvius Rus, Lawrence Rauchwerger, and Jay Hoeflinger. 2003. Hybrid analysis: Static 8 dynamic memory reference analysis. International Journal of Parallel Programming 31, 4 (Aug. 2003), 251--283.
[37]
Diogo N. Sampaio, Louis-Noël Pouchet, and Fabrice Rastello. 2017. Simplification and runtime resolution of data dependence constraints for loop transformations. In Proceedings of the International Conference on Supercomputing (ICS’17). ACM.
[38]
Andreas Simbürger, Sven Apel, Armin Größlinger, and Christian Lengauer. 2018. PolyJIT: Polyhedral optimization just in time. International Journal of Parallel Programming (Aug. 2018).
[39]
Aravind Sukumaran-Rajam. 2015. Beyond the Realm of the Polyhedral Model: Combining Speculative Program Parallelization with Polyhedral Compilation. Theses. Université de Strasbourg. Retrieved from https://hal.inria.fr/tel-01251748.
[40]
Aravind Sukumaran-Rajam and Philippe Clauss. 2015. The polyhedral model of nonlinear loops. ACM Transactions on Architecture and Code Optimization 12, 4 (Dec. 2015), 27.
[41]
Georgios Tournavitis and Björn Franke. 2010. Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM.
[42]
Konrad Trifunovic, Albert Cohen, David Edelsohn, Feng Li, Tobias Grosser, Harsha Jagasia, Razya Ladelsky, Sebastian Pop, Jan Sjödin, and Ramakrishna Upadrasta. 2010. GRAPHITE two years after: First lessons learned from real-world polyhedral compilation. In GCC Research Opportunities Workshop (GROW’10). ACM.
[43]
Robert A. Van Engelen. 2001. Efficient symbolic analysis for optimizing compilers. In International Conference on Compiler Construction. Springer.
[44]
Hans Vandierendonck, Sean Rul, and Koen De Bosschere. 2010. The paralax infrastructure: Automatic parallelization with a helping hand. In 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM.
[45]
Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Transactions on Architecture and Code Optimization 9, 4 (Jan. 2013), 23.
[46]
Zheng Wang, Georgios Tournavitis, Björn Franke, and Michael F. P. O’Boyle. 2014. Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM Transactions on Architecture and Code Optimization (TACO) 11, 1 (2014), 26.

Cited By

View all
  • (2022)Vectorizing sparse matrix computations with partially-strided codeletsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571927(1-15)Online publication date: 13-Nov-2022
  • (2022)QRANE: lifting QASM programs to an affine IRProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517775(15-28)Online publication date: 19-Mar-2022
  • (2022)Vectorizing Sparse Matrix Computations with Partially-Strided CodeletsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00037(1-15)Online publication date: Nov-2022
  • Show More Cited By

Index Terms

  1. Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 16, Issue 4
    December 2019
    572 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3366460
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 December 2019
    Accepted: 01 September 2019
    Revised: 01 August 2019
    Received: 01 February 2019
    Published in TACO Volume 16, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Performance feedback
    2. binary
    3. compiler optimization
    4. dynamic dependency graph
    5. instrumentation
    6. loop transformations
    7. polyhedral model

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • U.S. National Science Foundation
    • French program Investissement d'avenir
    • LabEx PERSYVAL-Lab

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)136
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Vectorizing sparse matrix computations with partially-strided codeletsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571927(1-15)Online publication date: 13-Nov-2022
    • (2022)QRANE: lifting QASM programs to an affine IRProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517775(15-28)Online publication date: 19-Mar-2022
    • (2022)Vectorizing Sparse Matrix Computations with Partially-Strided CodeletsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00037(1-15)Online publication date: Nov-2022
    • (2021)DNNFusion: accelerating deep neural networks execution with advanced operator fusionProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454083(883-898)Online publication date: 19-Jun-2021
    • (2021)Reverse engineering for reduction parallelization via semiring polynomialsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454079(820-834)Online publication date: 19-Jun-2021

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media