More Web Proxy on the site http://driver.im/

research-article

Open access

Efficient tree-traversals: reconciling parallelism and dense data representations

Authors:

Chaitanya Koparkar,

Michael Vollmer,

Milind Kulkarni,

Ryan R. NewtonAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 5, Issue ICFP

Article No.: 91, Pages 1 - 29

https://doi.org/10.1145/3473596

Published: 19 August 2021 Publication History

Abstract

Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, practical compromise is possible. We present Parallel Gibbon, a compiler that obtains the benefits of dense data formats and parallelism. We formalize the semantics of the parallel location calculus underpinning this novel implementation strategy, and show that it is type-safe. Parallel Gibbon exceeds the parallel performance of existing compilers for purely functional programs that use recursive algebraic datatypes, including, notably, abstract-syntax-tree traversals as in compilers.

Supplementary Material

Auxiliary Presentation Video (icfp21main-p158-p-video.mp4)

Version 2 of the video for "Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations".

Download
163.66 MB

MP4 File (3473596.mp4)

Presentation Videos

Download
34.73 MB

References

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.

[2]

Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2019. Provably and Practically Efficient Granularity Control. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). ACM, New York, NY, USA. 214–228. isbn:978-1-4503-6225-2 http://mike-rainey.site/papers/oracle-ppop19-long.pdf

Digital Library

[3]

Umut A Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. http://mike-rainey.site/papers/heartbeat.pdf

[4]

Todd A. Anderson, Hai Liu, Lindsey Kuper, Ehsan Totoni, Jan Vitek, and Tatiana Shpeisman. 2017. Parallelizing Julia with a Non-Invasive DSL. In 31st European Conference on Object-Oriented Programming (ECOOP 2017), Peter Müller (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 74). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 4:1–4:29. isbn:978-3-95977-035-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2017.4

[5]

Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data structures for parallel computing. ACM Trans. Program. Lang. Syst., 11 (1989), October, 598–632. issn:0164-0925 https://doi.org/10.1145/69558.69562

Digital Library

[6]

Lars Bergstrom, Matthew Fluet, Mike Rainey, John Reppy, Stephen Rosen, and Adam Shaw. 2013. Data-Only Flattening for Nested Data Parallelism. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). Association for Computing Machinery, New York, NY, USA. 81–92. isbn:9781450319225 https://doi.org/10.1145/2442516.2442525

Digital Library

[7]

Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack. 2017. Linear Haskell: Practical Linearity in a Higher-Order Polymorphic Language. Proc. ACM Program. Lang., 2, POPL (2017), Article 5, Dec., 29 pages. https://doi.org/10.1145/3158093

Digital Library

[8]

Guy E. Blelloch. 1992. NESL: A Nested Data-Parallel Language. USA.

Digital Library

[9]

Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ’95). Association for Computing Machinery, New York, NY, USA. 207–216. isbn:0897917006 https://doi.org/10.1145/209936.209958

Digital Library

[10]

Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming (DAMP ’11). Association for Computing Machinery, New York, NY, USA. 3–14. isbn:9781450304863 https://doi.org/10.1145/1926354.1926358

Digital Library

[11]

Adam Chlipala. 2015. An Optimizing Compiler for a Purely Functional Web-application Language. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 10–21. isbn:978-1-4503-3669-7 https://doi.org/10.1145/2784731.2784741

Digital Library

[12]

Byn Choi, Rakesh Komuravelli, Victor Lu, Hyojin Sung, Robert L. Bocchino, Sarita V. Adve, and John C. Hart. 2010. Parallel SAH K-D Tree Construction. In Proceedings of the Conference on High Performance Graphics (HPG ’10). Eurographics Association, Goslar, DEU. 77–86.

[13]

Matthias Felleisen, Robert Bruce Findler, and Matthew Flatt. 2009. Semantics engineering with PLT Redex. Mit Press.

Digital Library

[14]

Jerome H. Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw., 3, 3 (1977), Sept., 209–226. issn:0098-3500 https://doi.org/10.1145/355744.355745

Digital Library

[15]

Michael Goldfarb, Youngjoon Jo, and Milind Kulkarni. 2013. General transformations for GPU execution of tree traversals. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Supercomputing) (SC ’13).

Digital Library

[16]

Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut Acar, and Matthew Fluet. 2018. Hierarchical Memory Management for Mutable State. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA. 81–93. isbn:9781450349826 https://doi.org/10.1145/3178487.3178494

Digital Library

[17]

Robert H. Halstead. 1985. MULTILISP: A Language for Concurrent Symbolic Computation. ACM Trans. Program. Lang. Syst., 7, 4 (1985), Oct., 501–538. issn:0164-0925 https://doi.org/10.1145/4472.4478

Digital Library

[18]

Tim Harris and Satnam Singh. 2007. Feedback Directed Implicit Parallelism. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP ’07). Association for Computing Machinery, New York, NY, USA. 251–264. isbn:9781595938152 https://doi.org/10.1145/1291151.1291192

Digital Library

[19]

Gabriele Keller and Manuel M. T. Chakravarty. 1998. Flattening Trees. In Proceedings of the 4th International Euro-Par Conference on Parallel Processing (Euro-Par ’98). Springer-Verlag, Berlin, Heidelberg. 709–719. isbn:3540649522

[20]

Chaitanya Koparkar, Mike Rainey, Michael Vollmer, Milind Kulkarni, and Ryan R. Newton. 2021. Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations. arxiv:2107.00522.

[21]

Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R. Newton. 2014. Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). Association for Computing Machinery, New York, NY, USA. 2–14. isbn:9781450327848 https://doi.org/10.1145/2594291.2594312

Digital Library

[22]

Doug Lea. 2000. A java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande. 36–43.

Digital Library

[23]

Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2020. The OCaml system release. Feb, https://ocaml.org/releases/4.10/htmlman/index.html

[24]

Junichiro Makino. 1990. Vectorization of a treecode. J. Comput. Phys., 87 (1990), March, 148–160. issn:0021-9991 https://doi.org/10.1016/0021-9991(90)90231-O

Digital Library

[25]

Simon Marlow, Tim Harris, Roshan P. James, and Simon Peyton Jones. 2008. Parallel Generational-Copying Garbage Collection with a Block-Structured Heap. In Proceedings of the 7th International Symposium on Memory Management (ISMM ’08). Association for Computing Machinery, New York, NY, USA. 11–20. isbn:9781605581347 https://doi.org/10.1145/1375634.1375637

Digital Library

[26]

Simon Marlow, Ryan Newton, and Simon Peyton Jones. 2011. A Monad for Deterministic Parallelism. SIGPLAN Not., 46, 12 (2011), Sept., 71–82. issn:0362-1340 https://doi.org/10.1145/2096148.2034685

Digital Library

[27]

Simon Marlow, Simon Peyton Jones, and Satnam Singh. 2009. Runtime Support for Multicore Haskell. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 65–78. isbn:9781605583327 https://doi.org/10.1145/1596550.1596563

Digital Library

[28]

R. Menon and L. Dagum. 1998. OpenMP: An Industry-Standard API for Shared-Memory Programming. Computing in Science and Engineering, v, 01 (1998), jan, 46–55. issn:1558-366X https://doi.org/10.1109/99.660313

Digital Library

[29]

Leo A. Meyerovich, Todd Mytkowicz, and Wolfram Schulte. 2011. Data Parallel Programming for Irregular Tree Computations. In HotPAR. https://www.microsoft.com/en-us/research/publication/data- parallel-programming-for-irregular-tree-computations/

[30]

Robin Milner, Mads Tofte, and David Macqueen. 1997. The Definition of Standard ML. MIT Press, Cambridge, MA, USA. isbn:0262631814

[31]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8026–8037.

Digital Library

[32]

Ram Raghunathan, Stefan K Muller, Umut A Acar, and Guy Blelloch. 2016. Hierarchical memory management for parallel programs. ACM SIGPLAN Notices, 51, 9 (2016), 392–406.

Digital Library

[33]

Mike Rainey, Kyle A Hale, Ryan Newton, Nikos Hardavellas, Simone Campanoni, Peter Dinda, and Umut Acar. 2021. Task Parallel Assembly Language for Uncompromising Parallelism.

[34]

John Reppy, Claudio V. Russo, and Yingqi Xiao. 2009. Parallel Concurrent ML. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 257–268. isbn:9781605583327 https://doi.org/10.1145/1596550.1596588

Digital Library

[35]

Dipanwita Sarkar, Oscar Waddell, and R. Kent Dybvig. 2004. A Nanopass Infrastructure for Compiler Education. In Proceedings of the Ninth ACM SIGPLAN International Conference on Functional Programming (ICFP ’04). Association for Computing Machinery, New York, NY, USA. 201–212. isbn:1581139055 https://doi.org/10.1145/1016850.1016878

Digital Library

[36]

Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief Announcement: The Problem Based Benchmark Suite. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’12). Association for Computing Machinery, New York, NY, USA. 68–70. isbn:9781450312134 https://doi.org/10.1145/2312005.2312018

Digital Library

[37]

Jeremy Siek, Carl Factora, Andre Kuhlenschmidt, Ryan Newton, Scott Ryan, Cameron Swords, Michael Vitousek, and Michael Vollmer. 2020. Essentials Of Compilation: An Incremental Approach. https://iucompilercourse.github.io/IU-P423-P523-E313-E513-Fall-2020/

[38]

KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. Proc. ACM Program. Lang., 4, ICFP, Article 113, 30 pages. https://doi.org/10.1145/3408995

Digital Library

[39]

KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. arXiv preprint arXiv:2004.11663.

[40]

Mads Tofte, Lars Birkedal, Martin Elsman, and Niels Hallenberg. 2004. A Retrospective on Region-Based Memory Management. Higher Order Symbol. Comput., 17, 3 (2004), Sept., 245–265. issn:1388-3690 https://doi.org/10.1023/B:LISP.0000029446.78563.a4

Digital Library

[41]

Mads Tofte and Jean-Pierre Talpin. 1997. Region-Based Memory Management. Inf. Comput., 132, 2 (1997), Feb., 109–176. issn:0890-5401 https://doi.org/10.1006/inco.1996.2613

Digital Library

[42]

Katsuhiro Ueno and Atsushi Ohori. 2016. A fully concurrent garbage collector for functional programs on multicore processors. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming. 421–433.

Digital Library

[43]

Kenton Varda. 2015. Cap’n Proto. https://capnproto.org/

[44]

Michael Vollmer, Chaitanya Koparkar, Mike Rainey, Laith Sakka, Milind Kulkarni, and Ryan R. Newton. 2019. LoCal: A Language for Programs Operating on Serialized Data. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 48–62. isbn:9781450367127 https://doi.org/10.1145/3314221.3314631

Digital Library

[45]

Michael Vollmer, Sarah Spall, Buddhika Chamith, Laith Sakka, Chaitanya Koparkar, Milind Kulkarni, Sam Tobin-Hochstadt, and Ryan R. Newton. 2017. Compiling Tree Transforms to Operate on Packed Representations. In 31st European Conference on Object-Oriented Programming (ECOOP 2017), Peter Müller (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 74). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 26:1–26:29. isbn:978-3-95977-035-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2017.26

[46]

Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. 2019. Disentanglement in Nested-Parallel Programs. Proc. ACM Program. Lang., 4, POPL (2019), Article 47, Dec., 32 pages. https://doi.org/10.1145/3371115

Digital Library

[47]

Edward Z. Yang, Giovanni Campagna, Ömer S. Ağacan, Ahmed El-Hassany, Abhishek Kulkarni, and Ryan R. Newton. 2015. Efficient Communication and Collection with Compact Normal Forms. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 362–374. isbn:978-1-4503-3669-7 https://doi.org/10.1145/2784731.2784735

Digital Library

Cited By

Elsman M(2024)Double-Ended Bit-Stealing for Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36746288:ICFP(88-120)Online publication date: 15-Aug-2024
https://dl.acm.org/doi/10.1145/3674628
Larose OKaleba SBurchell HMarr S(2023)AST vs. Bytecode: Interpreters in the Age of Meta-CompilationProceedings of the ACM on Programming Languages10.1145/36228087:OOPSLA2(318-346)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622808
Baudon TRadanne GGonnord L(2023)Bit-Stealing Made Legal: Compilation for Custom Memory Representations of Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36078587:ICFP(813-846)Online publication date: 31-Aug-2023
https://dl.acm.org/doi/10.1145/3607858
Show More Cited By

Index Terms

Efficient tree-traversals: reconciling parallelism and dense data representations
1. Software and its engineering
  1. Software notations and tools

Recommendations

A Comparison of 12 Parallel FORTRAN Dialects

A simple program that approximates pi by numerical quadrature is rewritten to run on nine commercially available processors to illustrate the compilations that arise in parallel programming in FORTRAN. The machines used are the Alliant FX/8, BBN ...
Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks
FRONTIERS '96: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation

High Performance Fortran (HPF) compilers and communication libraries with the standardized Message Passing Interface (MPI) are becoming widely available, easing the development of portable parallel applications. The Annai tool environment supports ...
Data-only flattening for nested data parallelism
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Data parallelism has proven to be an effective technique for high-level programming of a certain class of parallel applications, but it is not well suited to irregular parallel computations. Blelloch and others proposed nested data parallelism (NDP) as ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 5, Issue ICFP

August 2021

1011 pages

EISSN:2475-1421

DOI:10.1145/3482883

Issue’s Table of Contents

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 August 2021

Published in PACMPL Volume 5, Issue ICFP

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
2,979
Total Downloads

Downloads (Last 12 months)231
Downloads (Last 6 weeks)24

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Elsman M(2024)Double-Ended Bit-Stealing for Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36746288:ICFP(88-120)Online publication date: 15-Aug-2024
https://dl.acm.org/doi/10.1145/3674628
Larose OKaleba SBurchell HMarr S(2023)AST vs. Bytecode: Interpreters in the Age of Meta-CompilationProceedings of the ACM on Programming Languages10.1145/36228087:OOPSLA2(318-346)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622808
Baudon TRadanne GGonnord L(2023)Bit-Stealing Made Legal: Compilation for Custom Memory Representations of Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36078587:ICFP(813-846)Online publication date: 31-Aug-2023
https://dl.acm.org/doi/10.1145/3607858
Arora JWestrick SAcar U(2023)Efficient Parallel Functional Programming with EffectsProceedings of the ACM on Programming Languages10.1145/35912847:PLDI(1558-1583)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591284
Koparkar C(2022)Making GHC whole again or, how to perform whole-program analysis within GHCXRDS: Crossroads, The ACM Magazine for Students10.1145/349527228:2(80-81)Online publication date: 10-Jan-2022
https://dl.acm.org/doi/10.1145/3495272

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents