[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3416315.3416320acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

Published: 07 October 2020 Publication History

Abstract

Asynchronous programming models (APM) are gaining more and more traction, allowing applications to expose the available concurrency to a runtime system tasked with coordinating the execution. While MPI has long provided support for multi-threaded communication and non-blocking operations, it falls short of adequately supporting APMs as correctly and efficiently handling MPI communication in different models is still a challenge. Meanwhile, new low-level implementations of light-weight, cooperatively scheduled execution contexts (fibers, aka user-level threads (ULT)) are meant to serve as a basis for higher-level APMs and their integration in MPI implementations has been proposed as a replacement for traditional POSIX thread support to alleviate these challenges.
In this paper, we first establish a taxonomy in an attempt to clearly distinguish different concepts in the parallel software stack. We argue that the proposed tight integration of fiber implementations with MPI is neither warranted nor beneficial and instead is detrimental to the goal of MPI being a portable communication abstraction. We propose MPI Continuations as an extension to the MPI standard to provide callback-based notifications on completed operations, leading to a clear separation of concerns by providing a loose coupling mechanism between MPI and APMs. We show that this interface is flexible and interacts well with different APMs, namely OpenMP detached tasks, OmpSs-2, and Argobots.

References

[1]
Atul Adya, Jon Howell, Marvin Theimer, William J. Bolosky, and John R. Douceur. 2002. Cooperative Task Management Without Manual Stack Management. In Proceedings of the General Track of the Annual Conference on USENIX Annual Technical Conference(ATEC ’02). USENIX Association. https://doi.org/10.5555/647057.713851
[2]
Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. 1991. Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles(SOSP ’91). Association for Computing Machinery. https://doi.org/10.1145/121132.121151
[3]
Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrent Computing: Practice and Experience 23, 2 (Feb. 2011), 187–198. https://doi.org/10.1002/cpe.1631
[4]
Purushotham V. Bangalore, Rolf Rabenseifner, Daniel J. Holmes, Julien Jaeger, Guillaume Mercier, Claudia Blaas-Schenner, and Anthony Skjellum. 2019. Exposition, Clarification, and Expansion of MPI Semantic Terms and Conventions: Is a Nonblocking MPI Function Permitted to Block?. In Proceedings of the 26th European MPI Users’ Group Meeting (Zürich, Switzerland) (EuroMPI ’19). Association for Computing Machinery, New York, NY, USA, Article 2, 10 pages. https://doi.org/10.1145/3343211.3343213
[5]
Brian W. Barrett, Ron Brightwell, Ryan E. Grant, Scott Hemmert, Kevin Pedretti, Kyle Wheeler, Keith Underwood, Rolf Riesen, Torsten Hoefler, Arthur B. Maccabe, and Trammell Hudson. 2018. The Portals 4.2 Network Programming Interface. Technical Report SAND2018-12790. Sandia National Laboratories.
[6]
David E. Bernholdt, Swen Boehm, George Bosilca, Manjunath Grentla Venkata, Ryan E. Grant, Thomas Naughton, Howard P. Pritchard, Martin Schulz, and Geoffroy R. Vallee. 2018. A Survey of MPI Usage in the US Exascale Computing Project. Concurrency Computation: Practice and Experience (09-2018 2018). https://doi.org/10.1002/cpe.4851
[7]
G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra. 2011. DAGuE: A Generic Distributed DAG Engine for High Performance Computing. In 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. 1151–1158. https://doi.org/10.1109/IPDPS.2011.281
[8]
Emilio Castillo, Nikhil Jain, Marc Casas, Miquel Moreto, Martin Schulz, Ramon Beivide, Mateo Valero, and Abhinav Bhatele. 2019. Optimizing Computation-Communication Overlap in Asynchronous Task-Based Programs. In Proceedings of the ACM International Conference on Supercomputing(ICS ’19). Association for Computing Machinery. https://doi.org/10.1145/3330345.3330379
[9]
Barcelona Supercomputing Center. 2020. OmpSs-2 Specification. Technical Report. https://pm.bsc.es/ftp/ompss-2/doc/spec/OmpSs-2-Specification.pdf Last accessed May 12, 2020.
[10]
Gregor Daiß, Parsa Amini, John Biddiscombe, Patrick Diehl, Juhan Frank, Kevin Huck, Hartmut Kaiser, Dominic Marcello, David Pfander, and Dirk Pfüger. 2019. From Piz Daint to the Stars: Simulation of Stellar Mergers Using High-Level Abstractions. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC ’19). https://doi.org/10.1145/3295500.3356221
[11]
Rob F. Van der Wijngaart and Haoqiang Jin. 2003. NAS Parallel Benchmarks, Multi-Zone Versions. Technical Report NAS-03-010. NASA Advanced Supercomputing (NAS) Division.
[12]
James Dinan, Pavan Balaji, David Goodell, Douglas Miller, Marc Snir, and Rajeev Thakur. 2013. Enabling MPI Interoperability through Flexible Communication Endpoints. In Proceedings of the 20th European MPI Users’ Group Meeting(EuroMPI ’13). Association for Computing Machinery. https://doi.org/10.1145/2488551.2488553
[13]
Alejandro Duran, Ayguadem Eduard, Rosa M. Badia, Jesus Larbarta, Luis Martinell, Xavier Martorell, and Judit Plana. 2011. OmpSs: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters 21, 02 (2011), 173–193. https://doi.org/10.1142/S0129626411000151
[14]
Ralf S. Engelschall. 2006. GNU Pth—The GNU Portable Threads. Technical Report. https://www.gnu.org/software/pth/pth-manual.html Last accessed April 24, 2020.
[15]
Daniel P. Friedman, Christopher T. Haynes, and Eugene Kohlbecker. 1984. Programming with Continuations. In Program Transformation and Programming Environments, Peter Pepper (Ed.). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-46490-4_23
[16]
Nat Goodspeed and Oliver Kowalke. 2014. Distinguishing coroutines and fibers. Technical Report N4024. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4024.pdf Last accessed April 24, 2020.
[17]
Ryan E. Grant, Matthew G. F. Dosanjh, Michael J. Levenhagen, Ron Brightwell, and Anthony Skjellum. 2019. Finepoints: Partitioned Multithreaded MPI Communication. In High Performance Computing, Michèle Weiland, Guido Juckeland, Carsten Trinitis, and Ponnuswamy Sadayappan (Eds.). Springer International Publishing, Cham, 330–350.
[18]
Niklas Gustafsson, Artur Laksberg, Herb Sutter, and Sana Mithani. 2014. N3857: Improvements to std::future<T> and Related APIs. Technical Report N3857.
[19]
Marc-André Hermanns, Nathan T. Hjelm, Michael Knobloch, Kathryn Mohror, and Martin Schulz. 2019. The MPI_T events interface: An early evaluation and overview of the interface. Parallel Comput. (2019). https://doi.org/10.1016/j.parco.2018.12.006
[20]
Nathan Hjelm, Matthew G. F. Dosanjh, Ryan E. Grant, Taylor Groves, Patrick Bridges, and Dorian Arnold. 2018. Improving MPI Multi-threaded RMA Communication Performance. In Proceedings of the 47th International Conference on Parallel Processing(ICPP 2018). ACM, 58:1–58:11. https://doi.org/10.1145/3225058.3225114
[21]
Torsten Hoefler. 2008. Request Completion Callback Function. MPI Forum Discussion. Archived at https://github.com/mpi-forum/mpi-forum-historic/issues/26, last accessed July 6, 2020.
[22]
Torsten Hoefler, Greg Bronevetsky, Brian Barrett, Bronis R. de Supinski, and Andrew Lumsdaine. 2010. Efficient MPI Support for Advanced Hybrid Programming Models. In Recent Advances in the Message Passing Interface, Rainer Keller, Edgar Gabriel, Michael Resch, and Jack Dongarra (Eds.). Springer Berlin Heidelberg.
[23]
Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, Ryan E. Grant, and Ron Brightwell. 2017. sPIN: High-performance Streaming Processing In the Network. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC ’17). ACM, 59:1–59:16. https://doi.org/10.1145/3126908.3126970
[24]
IEEE and The Open Group. 2018. The Open Group Base Specifications Issue 7. IEEE Std 1003.1-2017. IEEE. https://pubs.opengroup.org/onlinepubs/9699919799/ Last accessed April 24, 2020.
[25]
Nusrat Sharmin Islam, Gengbin Zheng, Sayantan Sur, Akhil Langer, and Maria Garzaran. 2019. Minimizing the Usage of Hardware Counters for Collective Communication Using Triggered Operations. In Proceedings of the 26th European MPI Users’ Group Meeting (Zürich, Switzerland) (EuroMPI ’19). Association for Computing Machinery. https://doi.org/10.1145/3343211.3343222
[26]
S. Iwasaki, A. Amer, K. Taura, S. Seo, and P. Balaji. 2019. BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 29–42.
[27]
Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. HPX: A Task Based Programming Model in a Global Address Space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models(PGAS ’14). ACM, 6:1–6:11. https://doi.org/10.1145/2676870.2676883
[28]
Oliver Kowalke. 2013. Boost.Fiber – Overview. https://www.boost.org/doc/libs/1_72_0/libs/fiber/doc/html/fiber/overview.html Last accessed April 24, 2020.
[29]
Robert Latham, William Gropp, Robert Ross, and Rajeev Thakur. 2007. Extending the MPI-2 Generalized Request Interface. In Proceedings of the 14th European Conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface(PVM/MPI’07). Springer-Verlag.
[30]
Linux man-pages project 2019. makecontext, swapcontext – manipulate user context. Linux man-pages project. http://man7.org/linux/man-pages/man3/makecontext.3.htmlLast accessed April 24, 2020.
[31]
Linux man-pages project 2019. pthreads – POSIX threads. Linux man-pages project. http://man7.org/linux/man-pages/man7/pthreads.7.htmlLast accessed April 24, 2020.
[32]
H. Lu, S. Seo, and P. Balaji. 2015. MPI+ULT: Overlapping Communication and Computation with User-Level Threads. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. 444–454. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.82
[33]
Michael Mattsson. 1996. Object-Oriented Frameworks - A survey of methodological issues. Licentiate thesis. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.1424&rep=rep1&type=pdf
[34]
Guillaume Mercier, François Trahay, Darius Buntinas, and Elisabeth Brunet. 2009. NewMadeleine: An Efficient Support for High-Performance Networks in MPICH2. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS).
[35]
Microsoft 2018. About Processes and Threads. Microsoft. https://docs.microsoft.com/en-us/windows/win32/procthread/about-processes-and-threadsLast accessed April 24, 2020.
[36]
Microsoft 2018. Fibers. Microsoft. https://docs.microsoft.com/en-us/windows/win32/procthread/fibersLast accessed April 24, 2020.
[37]
MPI v3.1 2015. MPI: A Message-Passing Interface Standard. Technical Report. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdfLast accessed April 24, 2020.
[38]
NASA Advanced Supercomputing Division. [n.d.]. Problem Sizes and Parameters in NAS Parallel Benchmarks. https://www.nas.nasa.gov/publications/npb_problem_sizes.html
[39]
OpenFabrics Interfaces Working Group. 2017. High Performance Network Programming with OFI. https://github.com/ofiwg/ofi-guide/blob/master/OFIGuide.md Last accessed May 14, 2020.
[40]
OpenMP Architecture Review Board 2018. OpenMP Application Programming Interface, Version 5.0. OpenMP Architecture Review Board. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdfLast accessed April 24, 2020.
[41]
T. Patinyasakdikul, D. Eberius, G. Bosilca, and N. Hjelm. 2019. Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). 1–11. https://doi.org/10.1109/CLUSTER.2019.8891015
[42]
Howard Pritchard, Igor Gorodetsky, and Darius Buntinas. 2011. A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE. In Recent Advances in the Message Passing Interface, Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra (Eds.). Springer Berlin Heidelberg.
[43]
James Reinders. 2007. Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O’Reilly & Associates.
[44]
John C. Reynolds. 1993. The discoveries of continuations. LISP and Symbolic Computation (Nov. 1993). https://doi.org/10.1007/BF01019459
[45]
Kevin Sala, Jorge Bellón, Pau Farré, Xavier Teruel, Josep M. Perez, Antonio J. Peña, Daniel Holmes, Vicenç Beltran, and Jesus Labarta. 2018. Improving the Interoperability between MPI and Task-Based Programming Models. In Proceedings of the 25th European MPI Users’ Group Meeting(EuroMPI’18). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3236367.3236382
[46]
Kevin Sala, Xavier Teruel, Josep M. Perez, Antonio J. Peña, Vicenç Beltran, and Jesus Labarta. 2019. Integrating blocking and non-blocking MPI primitives with task-based programming models. Parallel Comput. 85 (Jul 2019), 153–166. https://doi.org/10.1016/j.parco.2018.12.008
[47]
Whit Schonbein, Ryan E. Grant, Matthew G. F. Dosanjh, and Dorian Arnold. 2019. INCA: In-Network Compute Assistance. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC ’19). Association for Computing Machinery. https://doi.org/10.1145/3295500.3356153
[48]
J. Schuchart, A. Bouteiller, and G. Bosilca. 2019. Using MPI-3 RMA for Active Messages. In 2019 IEEE/ACM Workshop on Exascale MPI (ExaMPI). 47–56.
[49]
Joseph Schuchart and José Gracia. 2019. Global Task Data-Dependencies in PGAS Applications. In High Performance Computing, Michèle Weiland, Guido Juckeland, Carsten Trinitis, and Ponnuswamy Sadayappan (Eds.). Springer International Publishing.
[50]
Joseph Schuchart, Keisuke Tsugane, José Gracia, and Mitsuhisa Sato. 2018. The Impact of Taskyield on the Design of Tasks Communicating Through MPI. In Evolving OpenMP for Evolving Architectures, Bronis R. de Supinski, Pedro Valero-Lara, Xavier Martorell, Sergi Mateo Bellido, and Jesus Labarta (Eds.). Springer International Publishing, 3–17. https://doi.org/10.1007/978-3-319-98521-3_1Awarded Best Paper.
[51]
S. Seo, A. Amer, P. Balaji, C. Bordage, G. Bosilca, A. Brooks, P. Carns, A. Castelló, D. Genet, T. Herault, S. Iwasaki, P. Jindal, L. V. Kalé, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, Y. Sun, K. Taura, and P. Beckman. 2018. Argobots: A Lightweight Low-Level Threading and Tasking Framework. IEEE Transactions on Parallel and Distributed Systems 29, 3 (March 2018), 512–526. https://doi.org/10.1109/TPDS.2017.2766062
[52]
Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. 2014. Operating System Concepts(9 ed.). John Wiley and Sons.
[53]
Dylan T. Stark, Richard F. Barrett, Ryan E. Grant, Stephen L. Olivier, Kevin T. Pedretti, and Courtenay T. Vaughan. 2014. Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications. In Proceedings of the 2014 Workshop on Exascale MPI(ExaMPI ’14). IEEE Press. https://doi.org/10.1109/ExaMPI.2014.6
[54]
Sun Microsystems, Inc 2002. Multithreading in the Solaris Operating Environment. Sun Microsystems, Inc. https://web.archive.org/web/20090226174929http://www.sun.com/software/whitepapers/solaris9/multithread.pdfLast accessed April 24, 2020.
[55]
The FreeBSD Project 2018. pthread – POSIX thread functions. The FreeBSD Project. https://www.freebsd.org/cgi/man.cgi?query=pthreadLast accessed April 24, 2020.
[56]
The NetBSD project 2016. pthread – POSIX Threads Library. The NetBSD project. https://netbsd.gw.com/cgi-bin/man-cgi?pthread+3.i386+NetBSD-8.0Last accessed April 24, 2020.
[57]
Ted Unangst, Kurt Miller, Marco S Hyman, Otto Moerbeek, and Philip Guenther. 2019. pthreads – POSIX 1003.1c thread interface. The OpenBSD project. https://man.openbsd.org/pthreads.3Last accessed April 24, 2020.
[58]
Unified Communication Framework Consortium. 2019. UCX: Unified Communication X API Standard v1.6. Unified Communication Framework Consortium. https://github.com/openucx/ucx/wiki/api-doc/v1.6/ucx-v1.6.pdf
[59]
Uresh Vahalia. 1998. UNIX Internals: The New Frontiers. Pearson.
[60]
Md. Wasi-ur Rahman, David Ozog, and James Dinan. 2020. Simplifying Communication Overlap in OpenSHMEM Through Integrated User-Level Thread Scheduling. In High Performance Computing, Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, and Hatem Ltaief (Eds.). Springer International Publishing.
[61]
K. B. Wheeler, R. C. Murphy, and D. Thain. 2008. Qthreads: An API for programming with millions of lightweight threads. In 2008 IEEE International Symposium on Parallel and Distributed Processing. https://doi.org/10.1109/IPDPS.2008.4536359

Cited By

View all
  • (2021)OpenMP application experiencesParallel Computing10.1016/j.parco.2021.102856109:COnline publication date: 30-Dec-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI/USA '20: Proceedings of the 27th European MPI Users' Group Meeting
September 2020
88 pages
ISBN:9781450388801
DOI:10.1145/3416315
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Continuations
  2. Fiber
  3. MPI+X
  4. OmpSs
  5. OpenMP
  6. TAMPI
  7. Tasks
  8. ULT

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

EuroMPI/USA '20
EuroMPI/USA '20: 27th European MPI Users' Group Meeting
September 21 - 24, 2020
TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)OpenMP application experiencesParallel Computing10.1016/j.parco.2021.102856109:COnline publication date: 30-Dec-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media