More Web Proxy on the site http://driver.im/

research-article

Open access

OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code

Authors:

Tao B. Schardl,

I-Ting Angelina LeeAuthors Info & Claims

PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

Pages 189 - 203

https://doi.org/10.1145/3572848.3577509

Published: 21 February 2023 Publication History

Abstract

This paper presents OpenCilk, an open-source software infrastructure for task-parallel programming that allows for substantial code reuse and easy exploration of design choices in language abstraction, compilation strategy, runtime mechanism, and productivity-tool development.

The OpenCilk infrastructure consists of three main components: a compiler designed to compile fork-join task-parallel code, an efficient work-stealing runtime scheduler, and a productivity-tool development framework based on compiler instrumentation designed for fork-join parallel computations. OpenCilk is modular --- modifying one component for the most part does not necessitate modifications to the other components --- and easy to extend --- its construction naturally encourages code reuse. Despite being modular and easy to extend, OpenCilk produces high-performing code.

We investigated OpenCilk's modularity, extensibility, and performance through several case studies, including a study to extend OpenCilk to support multiple parallel runtime systems, including Cilk Plus, OpenMP, and oneTBB. OpenCilk's design enables rapid prototyping of new compiler back ends to target different parallel-runtime ABIs. Each back end required fewer than 2000 new lines of code. We examined the OpenCilk runtime's performance empirically on 15 benchmark Cilk programs and found that it outperforms the other runtimes by a geometric mean of 4%--26% on 1 core and 10%--120% on 48 cores.

References

[1]

2022. Rust Documentation: Crate tokio. url-https://docs.rs/tokio/latest/tokio/. Accessed in August 2022.

[2]

2022. A Tour of Go: Goroutines. urlhttps://go.dev/tour/concurrency/1. Accessed in August 2022.

[3]

GCC 4.9. 2014. GCC 4.9 Release Series Changes, New Features, and Fixes. Available at https://gcc.gnu.org/gcc-4.9/changes.html.

[4]

Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2019. Provably and Practically Efficient Granularity Control. In PPoPP. ACM, 214--228.

[5]

Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. In PLDI. ACM, 769--782.

[6]

Umut A. Acar, Arthur Chargueraud, and Mike Rainey. 2013. Scheduling Parallel Programs by Work Stealing with Private Deques. In PPoPP. 219--228.

[7]

Rajkishore Barik, Zoran Budimlić, Vincent Cavè, Sanjay Chatterjee, Yi Guo, David Peixotto, Raghavan Raman, Jun Shirako, Sağnak Taşırlar, Yonghong Yan, Yisheng Zhao, and Vivek Sarkar. 2009. The Habanero Multicore Software Research Project. In OOPSLA. 735--736.

[8]

Rajkishore Barik and Vivek Sarkar. 2009. Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs. In PACT. 41--52.

[9]

Rajkishore Barik, Jisheng Zhao, and Vivek Sarkar. 2013. Interproce-dural Strength Reduction of Critical Sections in Explicitly-Parallel Programs. In PACT. 29--40.

[10]

David A. Beckingsale, Jason Burmark, Rich Hornung, Holger Jones, William Killian, Adam J. Kunen, Olga Pearce, Peter Robinson, Brian S. Ryujin, and Thomas RW Scogland. 2019. RAJA: Portable Performance for Large-Scale Scientific Applications. In P3HPC. 71--81.

[11]

Guy E. Blelloch and Margaret Reid-Miller. 1997. Pipelining with futures. In SPAA. ACM, 249--259.

[12]

Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. JACM 46, 5 (1999), 720--748.

Digital Library

[13]

Hans-J. Boehm. 2005. Threads Cannot Be Implemented as a Library. SIGPLAN Not. 40, 6 (jun 2005), 261--268.

Digital Library

[14]

Randal E. Bryant and David R. O'Hallaron. 2015. Computer Systems: A Programmer's Perspective (3rd ed.). Pearson, USA.

Digital Library

[15]

Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: the new adventures of old X10. In PPPJ. 51--61.

[16]

Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An Object-Oriented Approach to Non-Uniform Cluster Computing. In OOPSLA. 519--538.

[17]

Intel Comunities. 2012. Problems with cilkprof. Available from https://community.intel.com/t5/Software-Archive/Problems-using-cilkprof/m-p/741804.

[18]

Google Corporation. 2022. TCMalloc. https://google.github.io/tcmalloc/. Accessed August 2022.

[19]

Intel Corporation. 2011. Intel Cilk Plus Software Development Kit. Available at http://software.intel.com/en-us/articles/intel-cilk-plus-software-development-kit/.

[20]

Microsoft Corporation. 2021. Parallel Patterns Library (PPL). Available from https://docs.microsoft.com/en-us/cpp/parallel/concrt/parallel-patterns-library-ppl.

[21]

NVIDIA Corporation. 2022. Libdevice User's Guide. https://docs.nvidia.com/cuda/libdevice-users-guide/index.html. Accessed August 2022.

[22]

Oracle Corporation. 2022. GraalVM. https://www.graalvm.org/. Accessed August 2022.

[23]

John S. Danaher, I-Ting Angelina Lee, and Charles E. Leiserson. 2008. Programming with exceptions in JCilk. Science of Computer Programming 63, 2 (Dec. 2008), 147--171.

[24]

Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2021. Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable. ACM TOPC 8, 1, Article 4 (apr 2021), 70 pages.

[25]

Johannes Doerfert and Hal Finkel. 2018. Compiler Optimizations for OpenMP. In Evolving OpenMP for Evolving Architectures. 113--127.

[26]

H. Carter Edwards, Christian R. Trott, and Daniel Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. JPDC 74, 12 (2014), 3202--3216.

Digital Library

[27]

Alexandre E. Eichenberger, John Mellor-Crummey, Martin Schulz, Michael Wong, Nawal Copty, Robert Dietrich, Xu Liu, Eugene Loh, and Daniel Lorenz. 2013. OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis. In OpenMP in the Era of Low Power Devices and Accelerators. 171--185.

[28]

Mingdong Feng and Charles E. Leiserson. 1999. Efficient Detection of Determinacy Races in Cilk Programs. Theory of Computing Systems 32, 3 (1999), 301--326.

[29]

Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. 212--223.

[30]

GCC Team. 2015. GOMP --- An OpenMP Implementation for GCC. Available at https://gcc.gnu.org/projects/gomp/.

[31]

Yi Guo, Rajkishore Barik, Raghavan Raman, and Vivek Sarkar. 2009. Work-first and Help-first Scheduling Policies for Async-finish Task Parallelism. In IPDPS. 1--12.

[32]

Pablo Halpern. 2013. Considering a Fork-Join Parallelism Library. Technical Report N3557. Intel Corporation. Available from https://isocpp.org/files/papers/n3557.pdf.

[33]

Yuxiong He, Charles E. Leiserson, and William M. Leiserson. 2010. The Cilkview Scalability Analyzer. In SPAA. 145--156.

[34]

Shams Imam and Vivek Sarkar. 2014. Cooperative Scheduling of Parallel Tasks with General Synchronization Patterns. In ECOOP. 618--643.

[35]

Shams M. Imam and Vivek Sarkar. 2012. Integrating Task Parallelism with Actors. In OOPSLA. 753--772.

[36]

Intel 2012. Intel® Cilk™ Plus Language Extension Specification. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1665.htm. Accessed in August 2022.

[37]

Intel Corporation. 2011. Intrinsics for Low Overhead Tool Annotations. Document Number 326357-001US. Available from http://web.archive.org/web/20150908043715/https://www.cilkplus.org/sites/default/files/open_specifications/LowOverheadAnnotations.pdf.

[38]

Intel Corporation. 2013. Cilk Plus/LLVM. Available from http://cilkplus.github.io/.

[39]

Intel Corporation 2023. Intel(R) oneAPI Threading Building Blocks (oneTBB) Documentation. Intel Corporation. Available from https://www.intel.com/content/www/us/en/develop/documentation/onetbb-documentation/top.html.

[40]

Teresa Johnson, Mehdi Amini, and Xinliang David Li. 2017. ThinLTO: Scalable and incremental LTO. In CGO. 111--121.

[41]

Tim Kaler, William Kuszmaul, Tao B. Schardl, and Daniele Vettorel. 2020. Cilkmem: Algorithms for Analyzing the Memory High-Water Mark of Fork-Join Parallel Programs. In APOCS. 162--176.

[42]

Tim Kaler, Tao B. Schardl, Brian Xie, Charles E. Leiserson, Jie Chen, Aldo Pareja, and Georgios Kollias. 2021. PARAD: A Work-Efficient Parallel Algorithm for Reverse-Mode Automatic Differentiation. In APOCS. 144--158.

[43]

Chris Lattner. 2002. LLVM: An Infrastructure for Multi-Stage Optimization. Master's thesis. Computer Science Dept., University of Illinois at Urbana-Champaign, Urbana, IL. See http://llvm.cs.uiuc.edu.

[44]

Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO. 75.

[45]

Doug Lea. 2000. A Java fork/join framework. In ACM 2000 Conference on Java Grande. 36--43.

Digital Library

[46]

I-Ting Angelina Lee, Silas Boyd-Wickizer, Zhiyi Huang, and Charles E. Leiserson. 2010. Using Memory Mapping to Support Cactus Stacks in Work-Stealing Runtime Systems. In PACT. ACM, 411--420.

[47]

Daan Leijen and Judd Hall. 2007. Optimize Managed Code For Multi-Core Machines. MSDN Magazine (2007). Available from http://msdn.microsoft.com/magazine/.

[48]

Charles E. Leiserson. 2010. The Cilk++ Concurrency Platform. J. Supercomputing 51, 3 (2010), 244--257.

Digital Library

[49]

Charles E. Leiserson, Tao B. Schardl, and Jim Sukha. 2012. Deterministic Parallel Random-Number Generation for Dynamic-Multithreading Platforms. In PPoPP.

[50]

LLVM Project. 2015. OpenMP: Support for the OpenMP Language. Available at http://openmp.llvm.org/.

[51]

Li Lu, Weixing Ji, and Michael L. Scott. 2014. Dynamic Enforcement of Determinism in a Parallel Scripting Language. In PLDI. 519--529.

[52]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI. 190--200.

[53]

Steven Margerm, Amirali Sharifian, Apala Guha, Arrvindh Shriraman, and Gilles Pokam. 2018. TAPAS: Generating Parallel Accelerators from Parallel Programs. In MICRO. 245--257.

[54]

Stefan K. Muller and Umut A. Acar. 2016. Latency-Hiding Work Stealing: Scheduling Interacting Parallel Computations with Work Stealing. In SPAA. 71--82.

[55]

Stefan K. Muller, Umut A. Acar, and Robert Harper. 2018. Competitive Parallelism: Getting Your Priorities Right. In ICFP. 95:1--95:30.

[56]

OpenMP 4.5 2015. OpenMP Application Program Interface, Version 4.5.

[57]

OpenMP 5.0 2018. OpenMP API 5.0 Specification.

[58]

Mantevo Organization. 2022. miniFE Finite Element Mini-Application. Available from https://github.com/Mantevo/miniFE. Accessed August 2022.

[59]

LLVM Project. 2022. LLVM Link Time Optimization: Design and Implementation. https://llvm.org/docs/LinkTimeOptimization.html. Accessed August 2022.

[60]

LLVM Project. 2022. My First Language Frontend with LLVM Tutorial. https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index.html. Accessed August 2022.

[61]

LLVM Project. 2023. Clang 14.0.0 documentation, OpenMP Support. https://releases.llvm.org/14.0.0/tools/clang/docs/OpenMPSupport.html. Accessed January 2023.

[62]

Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2012. Scalable and Precise Dynamic Datarace Detection for Structured Parallelism. In PLDI. 531--542.

[63]

James Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, Inc.

Digital Library

[64]

Tao B. Schardl, Tyler Denniston, Damon Doucet, Bradley C. Kuszmaul, I-Ting Angelina Lee, and Charles E. Leiserson. 2017. The CSI Framework for Compiler-Inserted Program Instrumentation. ACM POMACS 1, 2, Article 43 (Dec. 2017), 25 pages.

[65]

Tao B. Schardl, Bradley C. Kuszmaul, I-Ting Angelina Lee, William M. Leiserson, and Charles E. Leiserson. 2015. The Cilkprof Scalability Profiler. In SPAA. 89--100.

[66]

Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2019. Tapir: Embedding Recursive Fork-Join Parallelism into LLVM's Intermediate Representation. ACM TOPC 6, 4, Article 19 (Dec. 2019), 33 pages.

[67]

Tao B. Schardl and Siddharth Samsi. 2019. TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir. In HPEC. 1--8.

[68]

Ariya Shajii, Ibrahim Numanagić, Riyadh Baghdadi, Bonnie Berger, and Saman Amarasinghe. 2019. Seq: A High-Performance Language for Bioinformatics. Proc. ACM Program. Lang. 3, OOPSLA, Article 125 (oct 2019), 29 pages.

Digital Library

[69]

Shumpei Shiina and Kenjiro Taura. 2019. Almost Deterministic Work Stealing. In SC.

[70]

Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2008. Phasers: A Unified Deadlock-free Construct for Collective and Point-to-point Synchronization. In ICS. 277--288.

[71]

Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2009. Phaser accumulators: A new reduction construct for dynamic parallelism. In IPDPS. 1--12.

[72]

Kyle Singer, Kunal Agrawal, and I-Ting Angelina Lee. 2020. Scheduling I/O Latency-Hiding Futures in Task-Parallel Platforms. In APoCS. 147--161.

[73]

Kyle Singer, Noah Goldstein, Stefan K. Muller, Kunal Agrawal, I-Ting Angelina Lee, and Umut A. Acar. 2020. Priority Scheduling for Interactive Applications. In SPAA. 465--477.

[74]

Kyle Singer, Yifan Xu, and I-Ting Angelina Lee. 2019. Proactive Work Stealing for Futures. In PPoPP. 257--271.

[75]

Daniel Spoonhower, Guy E. Blelloch, Phillip B. Gibbons, and Robert Harper. 2009. Beyond Nested Parallelism: Tight Bounds on Work-stealing Overheads for Parallel Futures. In SPAA. 91--100.

[76]

Amitabh Srivastava and David W. Wall. 1992. A Practical System for Intermodule Code Optimization at Link-Time. Technical Report 92/6. Digital Western Research Laboratory.

[77]

Guy L. Steele, Doug Lea, and Christine H. Flood. 2014. Fast Splittable Pseudorandom Number Generators. In OOPSLA, Vol. 49. 453--472.

[78]

Olivier Tardieu, Haichuan Wang, and Haibo Lin. 2012. A Work-stealing Scheduler for X10's Task Parallelism with Suspension. In PPoPP. 267--276.

[79]

Sağnak Taşırlar and Vivek Sarkar. 2011. Data-Driven Tasks and Their Implementation. In ICPP. 652--661.

[80]

Peter Thoman, Peter Zangerl, and Thomas Fahringer. 2017. Task-Parallel Runtime System Optimization Using Static Compiler Analysis. In CF. 201--210.

[81]

Robert Utterback, Kunal Agrawal, Jeremy Fineman, and I-Ting Angelina Lee. 2016. Provably Good and Practically Efficient Parallel Race Detection for Fork-Join Programs. In SPAA. 83--94.

[82]

Robert Utterback, Kunal Agrawal, I-Ting Angelina Lee, and Milind Kulkarni. 2017. Processor-Oblivious Record and Replay. In PPoPP. 145--161.

[83]

Yifan Xu, I-Ting Angelina Lee, and Kunal Agrawal. 2018. Efficient Parallel Determinacy Race Detection for Two-dimensional Dags. In PPoPP. 368--380.

[84]

Yifan Xu, Kyle Singer, and I-Ting Angelina Lee. 2020. Parallel Determinacy Race Detection for Futures. In PPoPP. 217--231.

[85]

Victor A. Ying, Mark C. Jeffrey, and Daniel Sanchez. 2020. T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware. In ISCA. 159--172.

[86]

Adarsh Yoga and Santosh Nagarakatte. 2016. Atomicity Violation Checker for Task Parallel Programs. In CGO. 239--249.

[87]

Adarsh Yoga and Santosh Nagarakatte. 2017. A Fast Causal Profiler for Task Parallel Programs. In ESEC/FSE. 15--26.

[88]

Adarsh Yoga and Santosh Nagarakatte. 2019. Parallelism-Centric What-If and Differential Analyses. In PLDI. 485--501.

[89]

Adarsh Yoga, Santosh Nagarakatte, and Aarti Gupta. 2016. Parallel Data Race Detection for Task Parallel Programs with Locks. In FSE. 833--845.

Cited By

Josyula AVerma PGaonkar ABarua AHegde N(2024)Optimizing a Super-Fast Eigensolver for Hierarchically Semiseparable MatricesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673119(32-41)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673119
Chiu CXiong ZGuo ZHuang TLin Y(2024)An Efficient Task-Parallel Pipeline Programming FrameworkProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635037(95-106)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3635035.3635037
Abdi JPosluns GZhang GWang BJeffrey MAgrawal KPetrank E(2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659966
Show More Cited By

Index Terms

OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Concurrent programming languages
        Parallel programming languages

Index terms have been assigned to the content through auto-classification.

Recommendations

Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation

Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers ...
Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

This paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as ...
Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation
PPoPP '17

This paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

February 2023

480 pages

ISBN:9798400700156

DOI:10.1145/3572848

General Chair:
Maryam Mehri Dehnavi
University of Toronto
,
Program Chairs:
Milind Kulkarni
Purdue University
,
Sriram Krishnamoorthy
Google

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 February 2023

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

PPoPP '23

Sponsor:

PPoPP '23: The 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

February 25 - March 1, 2023

QC, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
1,117
Total Downloads

Downloads (Last 12 months)515
Downloads (Last 6 weeks)70

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Josyula AVerma PGaonkar ABarua AHegde N(2024)Optimizing a Super-Fast Eigensolver for Hierarchically Semiseparable MatricesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673119(32-41)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673119
Chiu CXiong ZGuo ZHuang TLin Y(2024)An Efficient Task-Parallel Pipeline Programming FrameworkProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635037(95-106)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3635035.3635037
Abdi JPosluns GZhang GWang BJeffrey MAgrawal KPetrank E(2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659966
Zhang GPosluns GJeffrey MAgrawal KPetrank E(2024)Multi Bucket Queues: Efficient Concurrent Priority SchedulingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659962(113-124)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659962
Kaler TChen XWheatman BCurtis DHoppe BSchardl TLeiserson C(2024)Speedcode: Software Performance Engineering Education via the Coding of Didactic Exercises2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00087(391-394)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00087
Blelloch GGu YSun Y(2024)Teaching Parallel Algorithms Using the Binary-Forking Model2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00080(346-351)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00080
Owens JHoppe B(2024)Helping Faculty Teach Software Performance Engineering2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00078(338-341)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00078
Shahawy MSönmez CBelentepe CIenne P(2024)HardCilk: Cilk-like Task Parallelism for FPGAs2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00025(140-150)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00025
Nookala PChard KRaicu I(2024)X-OpenMP — eXtreme fine-grained tasking using lock-less work stealingFuture Generation Computer Systems10.1016/j.future.2024.05.019159(444-458)Online publication date: Oct-2024
https://doi.org/10.1016/j.future.2024.05.019
Reitz LFohry C(2024)Task-Level Checkpointing and Localized Recovery to Tolerate Permanent Node Failures for Nested Fork–Join Programs in ClustersSN Computer Science10.1007/s42979-024-02624-85:3Online publication date: 13-Mar-2024
https://dl.acm.org/doi/10.1007/s42979-024-02624-8
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents