[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3331553.3342614acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Position-dependent arrays and their application for high performance code generation

Published: 18 August 2019 Publication History

Abstract

Modern parallel hardware promises unprecedented performance, for the gifted few experts who can program it correctly. Code generators from high-level languages provide an attractive alternative, promising to deliver high performance automatically.
Existing projects such as Accelerate, Futhark, Halide, or Lift show that this approach is feasible. Unfortunately, existing efforts focus on computations over tensors: regularly shaped higher dimensional arrays. This limits the expressiveness of these approaches and excludes many interesting data structures that are commonly encoded manually in memory, such as trees or triangular matrices.
This paper presents an extended array type that lifts this restriction. For multidimensional arrays, the size of a nested array might depend on its position in the surrounding arrays, enabling the expression of computations over less regularly shaped data structures. However, position-dependent arrays bring new challenges for high-performance code generation, as indexing elements in memory becomes more challenging.
This paper shows how these challenges are addressed by extending the existing Lift type system and compiler. The experimental results show that this approach enables the efficient code generation of triangular matrix-vector multiplication, with performance improvements over cuBLAS on an Nvidia GPU by up to 2×. Furthermore, we show a use case for a low-level optimization for avoiding unnecessary out-of-bound checks in stencils, leading to up to 3× improvements over already optimized generated stencil codes.

References

[1]
Robert Atkey, Michel Steuwer, Sam Lindley, and Christophe Dubach. 2017. Strategy Preserving Compilation for Parallel Functional Code. CoRR abs/1710.08332 (2017).
[2]
Henrik Barthels and Paolo Bientinesi. 2017. Linnea: Compiling Linear Algebra Expressions to High-Performance Code. In Proceedings of the International Workshop on Parallel Symbolic Computation, PASCO@ISSAC 2017, Kaiserslautern, Germany, July 23-24, 2017. 1:1-1:3.
[3]
Paolo Bientinesi, Brian Gunter, and Robert A. van de Geijn. 2008. Families of Algorithms Related to the Inversion of a Symmetric Positive Definite Matrix. ACM Trans. Math. Softw. 35, 1, Article 3 (July 2008), 22 pages.
[4]
Edwin Brady. 2013. Idris: general purpose programming with dependent types. In PLPV. ACM, 1-2.
[5]
Robert Clifton-Everest, Trevor L. McDonell, Manuel M. T. Chakravarty, and Gabriele Keller. 2017. Streaming irregular arrays. In Proceedings of the 10th ACM SIGPLAN International Symposium on Haskell, Oxford, United Kingdom, September 7-8, 2017. 174-185.
[6]
Thiago de Castro Martins, Jacqueline de Miranda Kian, André Kubagawa Sato, and Marcos de Sales Guerra Tsuzuki. 2012. Matrix-vector multiplication and triangular linear solver using GPGPU for symmetric positive definite matrices derived from elliptic equations. In SCIS&ISIS. IEEE, 1286-1291.
[7]
Yin Ding and Ivan W Selesnick. 2016. Sparsity-based correction of exponential artifacts. Signal Processing 120 (2016), 236-248.
[8]
Google et al. 2017. XLA (Accelerated Linear Algebra): domain-specific compiler for linear algebra that optimizes TensorFlow computations.
[9]
John A Gunnels, Fred G Gustavson, Greg M Henry, and Robert A Van De Geijn. 2001. FLAME: Formal linear algebra methods environment. ACM Transactions on Mathematical Software (TOMS) 27, 4 (2001), 422-455.
[10]
Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2018. High performance stencil code generation with Lift. In CGO. ACM.
[11]
Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin E. Oancea. 2017. Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. In PLDI. ACM.
[12]
Conor McBride. 2004. Epigram: Practical Programming with Dependent Types. In Advanced Functional Programming (Lecture Notes in Computer Science), Vol. 3622. Springer, 130-170.
[13]
Trevor L. McDonell, Manuel M. T. Chakravarty, Gabriele Keller, and Ben Lippmeier. 2013. Optimising purely functional GPU programs. In ICFP. ACM.
[14]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In PLDI. ACM.
[15]
Toomas Remmelg, Thibaut Lutz, Michel Steuwer, and Christophe Dubach. 2016. Performance portable GPU code generation for matrix multiplication. In GPGPU@PPoPP. ACM, 22-31.
[16]
Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Nadathur Satish, Jakob Olesen, Jongsoo Park, Artem Rakhov, and Misha Smelyanskiy. 2018. Glow: Graph Lowering Compiler Techniques for Neural Networks. CoRR abs/1805.00907 (2018).
[17]
Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. In ICFP. ACM, 205-217.
[18]
Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: a functional data-parallel IR for high-performance GPU code generation. In CGO. ACM, 74-85.
[19]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR abs/1802.04730 (2018).

Cited By

View all
  • (2023)Towards Structured Algebraic ProgrammingProceedings of the 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3589246.3595373(50-61)Online publication date: 6-Jun-2023
  • (2021)Generating high performance code for irregular data structures using dependent typesProceedings of the 9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3471873.3472977(37-49)Online publication date: 22-Aug-2021
  • (2020)Generating fast sparse matrix vector multiplication from a high level generic functional IRProceedings of the 29th International Conference on Compiler Construction10.1145/3377555.3377896(85-95)Online publication date: 22-Feb-2020

Index Terms

  1. Position-dependent arrays and their application for high performance code generation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FHPNC 2019: Proceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing
      August 2019
      59 pages
      ISBN:9781450368148
      DOI:10.1145/3331553
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 August 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Dependent types
      2. Irregular data structures
      3. Lift

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ICFP '19
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 18 of 25 submissions, 72%

      Upcoming Conference

      ICFP '25
      ACM SIGPLAN International Conference on Functional Programming
      October 12 - 18, 2025
      Singapore , Singapore

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)11
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 28 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Towards Structured Algebraic ProgrammingProceedings of the 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3589246.3595373(50-61)Online publication date: 6-Jun-2023
      • (2021)Generating high performance code for irregular data structures using dependent typesProceedings of the 9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3471873.3472977(37-49)Online publication date: 22-Aug-2021
      • (2020)Generating fast sparse matrix vector multiplication from a high level generic functional IRProceedings of the 29th International Conference on Compiler Construction10.1145/3377555.3377896(85-95)Online publication date: 22-Feb-2020

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media