[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Efficient tree-traversals: reconciling parallelism and dense data representations

Published: 19 August 2021 Publication History

Abstract

Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, practical compromise is possible. We present Parallel Gibbon, a compiler that obtains the benefits of dense data formats and parallelism. We formalize the semantics of the parallel location calculus underpinning this novel implementation strategy, and show that it is type-safe. Parallel Gibbon exceeds the parallel performance of existing compilers for purely functional programs that use recursive algebraic datatypes, including, notably, abstract-syntax-tree traversals as in compilers.

Supplementary Material

Auxiliary Presentation Video (icfp21main-p158-p-video.mp4)
Version 2 of the video for "Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations".
MP4 File (3473596.mp4)
Presentation Videos

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.
[2]
Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2019. Provably and Practically Efficient Granularity Control. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). ACM, New York, NY, USA. 214–228. isbn:978-1-4503-6225-2 http://mike-rainey.site/papers/oracle-ppop19-long.pdf
[3]
Umut A Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. http://mike-rainey.site/papers/heartbeat.pdf
[4]
Todd A. Anderson, Hai Liu, Lindsey Kuper, Ehsan Totoni, Jan Vitek, and Tatiana Shpeisman. 2017. Parallelizing Julia with a Non-Invasive DSL. In 31st European Conference on Object-Oriented Programming (ECOOP 2017), Peter Müller (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 74). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 4:1–4:29. isbn:978-3-95977-035-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2017.4
[5]
Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data structures for parallel computing. ACM Trans. Program. Lang. Syst., 11 (1989), October, 598–632. issn:0164-0925 https://doi.org/10.1145/69558.69562
[6]
Lars Bergstrom, Matthew Fluet, Mike Rainey, John Reppy, Stephen Rosen, and Adam Shaw. 2013. Data-Only Flattening for Nested Data Parallelism. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). Association for Computing Machinery, New York, NY, USA. 81–92. isbn:9781450319225 https://doi.org/10.1145/2442516.2442525
[7]
Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack. 2017. Linear Haskell: Practical Linearity in a Higher-Order Polymorphic Language. Proc. ACM Program. Lang., 2, POPL (2017), Article 5, Dec., 29 pages. https://doi.org/10.1145/3158093
[8]
Guy E. Blelloch. 1992. NESL: A Nested Data-Parallel Language. USA.
[9]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ’95). Association for Computing Machinery, New York, NY, USA. 207–216. isbn:0897917006 https://doi.org/10.1145/209936.209958
[10]
Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming (DAMP ’11). Association for Computing Machinery, New York, NY, USA. 3–14. isbn:9781450304863 https://doi.org/10.1145/1926354.1926358
[11]
Adam Chlipala. 2015. An Optimizing Compiler for a Purely Functional Web-application Language. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 10–21. isbn:978-1-4503-3669-7 https://doi.org/10.1145/2784731.2784741
[12]
Byn Choi, Rakesh Komuravelli, Victor Lu, Hyojin Sung, Robert L. Bocchino, Sarita V. Adve, and John C. Hart. 2010. Parallel SAH K-D Tree Construction. In Proceedings of the Conference on High Performance Graphics (HPG ’10). Eurographics Association, Goslar, DEU. 77–86.
[13]
Matthias Felleisen, Robert Bruce Findler, and Matthew Flatt. 2009. Semantics engineering with PLT Redex. Mit Press.
[14]
Jerome H. Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw., 3, 3 (1977), Sept., 209–226. issn:0098-3500 https://doi.org/10.1145/355744.355745
[15]
Michael Goldfarb, Youngjoon Jo, and Milind Kulkarni. 2013. General transformations for GPU execution of tree traversals. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Supercomputing) (SC ’13).
[16]
Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut Acar, and Matthew Fluet. 2018. Hierarchical Memory Management for Mutable State. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA. 81–93. isbn:9781450349826 https://doi.org/10.1145/3178487.3178494
[17]
Robert H. Halstead. 1985. MULTILISP: A Language for Concurrent Symbolic Computation. ACM Trans. Program. Lang. Syst., 7, 4 (1985), Oct., 501–538. issn:0164-0925 https://doi.org/10.1145/4472.4478
[18]
Tim Harris and Satnam Singh. 2007. Feedback Directed Implicit Parallelism. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP ’07). Association for Computing Machinery, New York, NY, USA. 251–264. isbn:9781595938152 https://doi.org/10.1145/1291151.1291192
[19]
Gabriele Keller and Manuel M. T. Chakravarty. 1998. Flattening Trees. In Proceedings of the 4th International Euro-Par Conference on Parallel Processing (Euro-Par ’98). Springer-Verlag, Berlin, Heidelberg. 709–719. isbn:3540649522
[20]
Chaitanya Koparkar, Mike Rainey, Michael Vollmer, Milind Kulkarni, and Ryan R. Newton. 2021. Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations. arxiv:2107.00522.
[21]
Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R. Newton. 2014. Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). Association for Computing Machinery, New York, NY, USA. 2–14. isbn:9781450327848 https://doi.org/10.1145/2594291.2594312
[22]
Doug Lea. 2000. A java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande. 36–43.
[23]
Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2020. The OCaml system release. Feb, https://ocaml.org/releases/4.10/htmlman/index.html
[24]
Junichiro Makino. 1990. Vectorization of a treecode. J. Comput. Phys., 87 (1990), March, 148–160. issn:0021-9991 https://doi.org/10.1016/0021-9991(90)90231-O
[25]
Simon Marlow, Tim Harris, Roshan P. James, and Simon Peyton Jones. 2008. Parallel Generational-Copying Garbage Collection with a Block-Structured Heap. In Proceedings of the 7th International Symposium on Memory Management (ISMM ’08). Association for Computing Machinery, New York, NY, USA. 11–20. isbn:9781605581347 https://doi.org/10.1145/1375634.1375637
[26]
Simon Marlow, Ryan Newton, and Simon Peyton Jones. 2011. A Monad for Deterministic Parallelism. SIGPLAN Not., 46, 12 (2011), Sept., 71–82. issn:0362-1340 https://doi.org/10.1145/2096148.2034685
[27]
Simon Marlow, Simon Peyton Jones, and Satnam Singh. 2009. Runtime Support for Multicore Haskell. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 65–78. isbn:9781605583327 https://doi.org/10.1145/1596550.1596563
[28]
R. Menon and L. Dagum. 1998. OpenMP: An Industry-Standard API for Shared-Memory Programming. Computing in Science and Engineering, v, 01 (1998), jan, 46–55. issn:1558-366X https://doi.org/10.1109/99.660313
[29]
Leo A. Meyerovich, Todd Mytkowicz, and Wolfram Schulte. 2011. Data Parallel Programming for Irregular Tree Computations. In HotPAR. https://www.microsoft.com/en-us/research/publication/data- parallel-programming-for-irregular-tree-computations/
[30]
Robin Milner, Mads Tofte, and David Macqueen. 1997. The Definition of Standard ML. MIT Press, Cambridge, MA, USA. isbn:0262631814
[31]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8026–8037.
[32]
Ram Raghunathan, Stefan K Muller, Umut A Acar, and Guy Blelloch. 2016. Hierarchical memory management for parallel programs. ACM SIGPLAN Notices, 51, 9 (2016), 392–406.
[33]
Mike Rainey, Kyle A Hale, Ryan Newton, Nikos Hardavellas, Simone Campanoni, Peter Dinda, and Umut Acar. 2021. Task Parallel Assembly Language for Uncompromising Parallelism.
[34]
John Reppy, Claudio V. Russo, and Yingqi Xiao. 2009. Parallel Concurrent ML. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 257–268. isbn:9781605583327 https://doi.org/10.1145/1596550.1596588
[35]
Dipanwita Sarkar, Oscar Waddell, and R. Kent Dybvig. 2004. A Nanopass Infrastructure for Compiler Education. In Proceedings of the Ninth ACM SIGPLAN International Conference on Functional Programming (ICFP ’04). Association for Computing Machinery, New York, NY, USA. 201–212. isbn:1581139055 https://doi.org/10.1145/1016850.1016878
[36]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief Announcement: The Problem Based Benchmark Suite. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’12). Association for Computing Machinery, New York, NY, USA. 68–70. isbn:9781450312134 https://doi.org/10.1145/2312005.2312018
[37]
Jeremy Siek, Carl Factora, Andre Kuhlenschmidt, Ryan Newton, Scott Ryan, Cameron Swords, Michael Vitousek, and Michael Vollmer. 2020. Essentials Of Compilation: An Incremental Approach. https://iucompilercourse.github.io/IU-P423-P523-E313-E513-Fall-2020/
[38]
KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. Proc. ACM Program. Lang., 4, ICFP, Article 113, 30 pages. https://doi.org/10.1145/3408995
[39]
KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. arXiv preprint arXiv:2004.11663.
[40]
Mads Tofte, Lars Birkedal, Martin Elsman, and Niels Hallenberg. 2004. A Retrospective on Region-Based Memory Management. Higher Order Symbol. Comput., 17, 3 (2004), Sept., 245–265. issn:1388-3690 https://doi.org/10.1023/B:LISP.0000029446.78563.a4
[41]
Mads Tofte and Jean-Pierre Talpin. 1997. Region-Based Memory Management. Inf. Comput., 132, 2 (1997), Feb., 109–176. issn:0890-5401 https://doi.org/10.1006/inco.1996.2613
[42]
Katsuhiro Ueno and Atsushi Ohori. 2016. A fully concurrent garbage collector for functional programs on multicore processors. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming. 421–433.
[43]
Kenton Varda. 2015. Cap’n Proto. https://capnproto.org/
[44]
Michael Vollmer, Chaitanya Koparkar, Mike Rainey, Laith Sakka, Milind Kulkarni, and Ryan R. Newton. 2019. LoCal: A Language for Programs Operating on Serialized Data. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 48–62. isbn:9781450367127 https://doi.org/10.1145/3314221.3314631
[45]
Michael Vollmer, Sarah Spall, Buddhika Chamith, Laith Sakka, Chaitanya Koparkar, Milind Kulkarni, Sam Tobin-Hochstadt, and Ryan R. Newton. 2017. Compiling Tree Transforms to Operate on Packed Representations. In 31st European Conference on Object-Oriented Programming (ECOOP 2017), Peter Müller (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 74). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 26:1–26:29. isbn:978-3-95977-035-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2017.26
[46]
Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. 2019. Disentanglement in Nested-Parallel Programs. Proc. ACM Program. Lang., 4, POPL (2019), Article 47, Dec., 32 pages. https://doi.org/10.1145/3371115
[47]
Edward Z. Yang, Giovanni Campagna, Ömer S. Ağacan, Ahmed El-Hassany, Abhishek Kulkarni, and Ryan R. Newton. 2015. Efficient Communication and Collection with Compact Normal Forms. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 362–374. isbn:978-1-4503-3669-7 https://doi.org/10.1145/2784731.2784735

Cited By

View all
  • (2024)Double-Ended Bit-Stealing for Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36746288:ICFP(88-120)Online publication date: 15-Aug-2024
  • (2023)AST vs. Bytecode: Interpreters in the Age of Meta-CompilationProceedings of the ACM on Programming Languages10.1145/36228087:OOPSLA2(318-346)Online publication date: 16-Oct-2023
  • (2023)Bit-Stealing Made Legal: Compilation for Custom Memory Representations of Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36078587:ICFP(813-846)Online publication date: 31-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 5, Issue ICFP
August 2021
1011 pages
EISSN:2475-1421
DOI:10.1145/3482883
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 August 2021
Published in PACMPL Volume 5, Issue ICFP

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Compilers
  2. Data Representation
  3. Parallelism
  4. Region Calculus

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)231
  • Downloads (Last 6 weeks)24
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Double-Ended Bit-Stealing for Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36746288:ICFP(88-120)Online publication date: 15-Aug-2024
  • (2023)AST vs. Bytecode: Interpreters in the Age of Meta-CompilationProceedings of the ACM on Programming Languages10.1145/36228087:OOPSLA2(318-346)Online publication date: 16-Oct-2023
  • (2023)Bit-Stealing Made Legal: Compilation for Custom Memory Representations of Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36078587:ICFP(813-846)Online publication date: 31-Aug-2023
  • (2023)Efficient Parallel Functional Programming with EffectsProceedings of the ACM on Programming Languages10.1145/35912847:PLDI(1558-1583)Online publication date: 6-Jun-2023
  • (2022)Making GHC whole again or, how to perform whole-program analysis within GHCXRDS: Crossroads, The ACM Magazine for Students10.1145/349527228:2(80-81)Online publication date: 10-Jan-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media