Enhancing MPI+OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support

Manuel Ferat¹¹,
Romain Pereira^12,14,15,
Adrien Roussel^13,14,
Patrick Carribault^13,14,
Luiz-Angelo Steffenel¹¹ &
…
Thierry Gautier¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13527))

Included in the following conference series:

International Workshop on OpenMP

475 Accesses
3 Citations
3 Altmetric

Abstract

Heterogeneous supercomputers are widespread over HPC systems and programming efficient applications on these architectures is a challenge. Task-based programming models are a promising way to tackle this challenge. Since OpenMP 4.0 and 4.5, the target directives enable to offload pieces of code to GPUs and to express it as tasks with dependencies. Therefore, heterogeneous machines can be programmed using MPI+OpenMP(task+target) to exhibit a very high level of concurrent asynchronous operations for which data transfers, kernel executions, communications and CPU computations can be overlapped. Hence, it is possible to suspend tasks performing these asynchronous operations on the CPUs and to overlap their completion with another task execution. Suspended tasks can resume once the associated asynchronous event is completed in an opportunistic way at every scheduling point. We have integrated this feature into the MPC framework and validated it on a AXPY microbenchmark and evaluated on a MPI+OpenMP(tasks) implementation of the LULESH proxy applications. The results show that we are able to improve asynchronism and the overall HPC performance, allowing applications to benefit from asynchronous execution on heterogeneous machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 31.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 39.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing OpenMP Tasking Model: Performance and Portability

Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment

On the Benefits of Tasking with OpenMP

Notes

References

Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011). https://doi.org/10.1002/cpe.1631
Article Google Scholar
Ayguadé, E., Badia, R.M., Igual, F.D., Labarta, J., Mayo, R., Quintana-Ortí, E.S.: An extension of the StarSs programming model for platforms with multiple GPUs. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 851–862. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03869-3_79
Chapter Google Scholar
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC 2012: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2012). https://doi.org/10.1109/SC.2012.71
Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21487-5_9
Chapter Google Scholar
Carribault, P., Pérache, M., Jourdren, H.: Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 1–14. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13217-9_1
Chapter Google Scholar
De Melo, A.C.: The new Linux ‘perf’ tools. In: Slides from Linux Kongress, vol. 18, pp. 1–42 (2010)
Google Scholar
Duran, A., Ferrer, R., Ayguadé, E., Badia, R.M., Labarta, J.: A proposal to extend the OpenMP tasking model with dependent tasks. Int. J. Parallel Program. 37(3), 292–305 (2009). https://doi.org/10.1007/s10766-009-0101-1
Article MATH Google Scholar
Gautier, T., Lementec, F., Faucher, V., Raffin, B.: X-kaapi: a multi paradigm runtime for multicore architectures. In: Workshop P2S2 in Conjunction of ICPP, Lyon, France, p. 16, October 2013. https://hal.inria.fr/hal-00727827
Jia, Z., Kwon, Y., Shipman, G., McCormick, P., Erez, M., Aiken, A.: A distributed multi-GPU system for fast graph processing. Proc. VLDB Endow. 11(3), 297–310 (2017). https://doi.org/10.14778/3157794.3157799
Article Google Scholar
Karlin, I., McGraw, J., Keasler, J., Still, B.: Tuning the LULESH Mini-App for Current and Future Hardware (2013). https://www.osti.gov/biblio/1070167
Karlin, I.: LULESH Programming Model and Performance Ports Overview (2012)
Google Scholar
Karlin, I., Keasler, J., Neely, R.: LULESH 2.0 updates and changes. Technical report LLNL-TR-641973, August 2013
Google Scholar
Kowalke, O.: Distinguishing coroutines and fibers (2014)
Google Scholar
Lin, D.-L., Huang, T.-W.: Efficient GPU computation using task graph parallelism. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021. LNCS, vol. 12820, pp. 435–450. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_27
Chapter Google Scholar
Lopez, V., Criado, J., Peñacoba, R., Ferrer, R., Teruel, X., Garcia-Gasulla, M.: An OpenMP free agent threads implementation. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 211–225. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85262-7_15
Chapter Google Scholar
Lu, H., Seo, S., Balaji, P.: MPI+ULT: overlapping communication and computation with user-level threads, pp. 444–454, August 2015. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.82
Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, Washington, DC, USA. IEEE Computer Society Press (2012)
Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.0 (2013). http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.5 (2015). http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
Pereira, R., Roussel, A., Carribault, P., Gautier, T.: Communication-aware task scheduling strategy in hybrid MPI+OpenMP applications. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 197–210. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85262-7_14
Chapter Google Scholar
Protze, J., Hermanns, M.A., Demiralp, A., Müller, M.S., Kuhlen, T.: MPI detach - asynchronous local completion. In: 27th European MPI Users’ Group Meeting, EuroMPI/USA 2020, pp. 71–80. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3416315.3416323
Sala, K., Teruel, X., Pérez, J., Peña, A., Beltran, V., Labarta, J.: Integrating blocking and non-blocking MPI primitives with task-based programming models. Parallel Comput. 85, 153–166 (2019). https://doi.org/10.1016/j.parco.2018.12.008
Article Google Scholar
Schuchart, J., Tsugane, K., Gracia, J., Sato, M.: The impact of Taskyield on the design of tasks communicating through MPI. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 3–17. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_1
Chapter Google Scholar
de Supinski, B.R., et al.: The ongoing evolution of OpenMP. Proc. IEEE 106(11), 2004–2019 (2018). https://doi.org/10.1109/JPROC.2018.2853600
Article Google Scholar
Tian, S., Doerfert, J., Chapman, B.: Concurrent execution of deferred OpenMP target tasks with hidden helper threads. In: Chapman, B., Moreira, J. (eds.) LCPC 2020. LNTCS, vol. 13149, pp. 41–56. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95953-1_4
Chapter Google Scholar

Download references

Acknowledgements

Present results have been obtained within the frame of DIGIT, Contractual Research Laboratory between CEA, CEA/DAM-Île de France center and the University of Reims Champagne Ardenne.

Author information

Authors and Affiliations

Université de Reims Champagne Ardenne, LICIIS, LRC DIGIT, 51097, Reims, France
Manuel Ferat & Luiz-Angelo Steffenel
CEA, DAM, DIF, 91297, Arpajon, France
Romain Pereira
CEA, DAM, DIF, LRC DIGIT, 91297, Arpajon, France
Adrien Roussel & Patrick Carribault
Université Paris-Saclay, CEA, Laboratoire en Informatique Haute Performance pour le Calcul et la simulation, 91680, Bruyères-le-Châtel, France
Romain Pereira, Adrien Roussel & Patrick Carribault
Project Team AVALON INRIA, LIP, ENS-Lyon, Lyon, France
Romain Pereira & Thierry Gautier

Authors

Manuel Ferat
View author publications
You can also search for this author in PubMed Google Scholar
Romain Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Roussel
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Carribault
View author publications
You can also search for this author in PubMed Google Scholar
Luiz-Angelo Steffenel
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Gautier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrien Roussel .

Editor information

Editors and Affiliations

OpenMP ARB, Beaverton, OR, USA
Michael Klemm
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg
University of Arizona, Tucson, AZ, USA
Brandon Neth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferat, M., Pereira, R., Roussel, A., Carribault, P., Steffenel, LA., Gautier, T. (2022). Enhancing MPI+OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support. In: Klemm, M., de Supinski, B.R., Klinkenberg, J., Neth, B. (eds) OpenMP in a Modern World: From Multi-device Support to Meta Programming. IWOMP 2022. Lecture Notes in Computer Science, vol 13527. Springer, Cham. https://doi.org/10.1007/978-3-031-15922-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-15922-0_1
Published: 20 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15921-3
Online ISBN: 978-3-031-15922-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing MPI+OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing OpenMP Tasking Model: Performance and Portability

Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment

On the Benefits of Tasking with OpenMP

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Enhancing MPI+OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing OpenMP Tasking Model: Performance and Portability

Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment

On the Benefits of Tasking with OpenMP

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation