Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers

Markus Nordén¹,
Henrik Löf¹,
Jarmo Rantakokko¹ &
…
Sverker Holmgren¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4315))

Included in the following conference series:

International Workshop on OpenMP

1255 Accesses
6 Citations

Abstract

On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. In this article, we study the performance of a parallel PDE solver with adaptive mesh refinement. The solver is parallelized using OpenMP and the adaptive mesh refinement makes dynamic load balancing necessary. Due to the dynamically changing memory access pattern caused by the runtime adaption, it is a challenging task to achieve a high degree of geographical locality.

The main conclusions of the study are: (1) that geographical locality is very important for the performance of the solver, (2) that the performance can be improved significantly using dynamic page migration of misplaced data, (3) that a migrate-on-next-touch directive works well whereas the first-touch strategy is less advantageous for programs exhibiting a dynamically changing memory access patterns, and (4) that the overhead for such migration is low compared to the total execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes

OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

Article 30 July 2018

Toward Heterogeneous MPI+MPI Programming: Comparison of OpenMP and MPI Shared Memory Models

References

Wilson, K.M., Aglietti, B.B.: Dynamic page placement to improve locality in CC-NUMA multiprocessors for TPC-C. In: Proceedings of the 2001 ACM/IEEE conference on Supercomputing, pp. 33–33. ACM Press, New York (2001)
Chapter Google Scholar
Corbalan, J., Martorell, X., Labarta, J.: Evaluation of the memory page migration influence in the system performance: the case of the SGI O2000. In: Proceedings of the 17th annual international conference on Supercomputing, pp. 121–129. ACM Press, New York (2003)
Chapter Google Scholar
Holmgren, S., Nordén, M., Rantakokko, J., Wallin, D.: Performance of PDE Solvers on a Self-Optimizing NUMA Architecture. Parallel Algorithms and Applications 17(4), 285–299 (2002)
Article MathSciNet Google Scholar
Bull, J.M., Johnson, C.: Data Distribution, Migration and Replication on a cc-NUMA Architecture. In: Proceedings of the Fourth European Workshop on OpenMP (2002), http://www.caspur.it/ewomp2002/
Rendleman, C.A.: Parallelization of structured, hiearchical adaptive mesh refinement algorithms. Computing and Visualization in Science 3, 147–157 (2000)
Article MATH Google Scholar
Deiterding, R.: Construction and application of an amr algorithm for distributed memory computers. In: Adaptive Mesh Refinement – Theory and Applications, Proc. of the Chicago Workshop on Adaptive Mesh Refinement Methods, pp. 361–372. Springer, Heidelberg (2003)
Google Scholar
MacNeice, P.: Paramesh: A parallel adaptive mesh refinement community toolkit. Computer physics communications 126, 330–354 (2000)
Article MATH Google Scholar
Parashar, M., Browne, J.: System engineering for high performance computing software: The hdda/dagh infrastructure for implementation of parallel structured adaptive mesh refinement. In: IMA Volume on Structured Adaptive Mesh Refinement (SAMR) Grid Methods, pp. 1–18 (2000)
Google Scholar
Wissink, A.M., Hornung, R.D., Kohn, S.R., Smith, S.S., Elliott, N.: Large scale parallel structured amr calculations using the samrai framework. In: Proceedings of SC 2001 (2001)
Google Scholar
Steensland, J.: Efficient partitioning of structured dynamic grid hierarchies. Doctoral thesis, Scientific computing, Department of Information Technology, University of Uppsala, Uppsala dissertations from the faculty of science and technology 44 (2002)
Google Scholar
Schloegel, K., Karypis, G., Kumar, V.: A unified algorithm for load-balancing adaptive scientific simulations. In: Proceedings Supercomputing 2000 (2000)
Google Scholar
Dreher, J., Grauer, R.: Racoon: A parallel mesh-adaptive framework for hyperbolic conservation laws. Parallel Computing 31, 913–932 (2005)
Article MathSciNet Google Scholar
Maerten, B.: Drama: A library for parallel dynamic load balancing of finite element applications. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 313–316. Springer, Heidelberg (1999)
Google Scholar
Walshaw, C., Cross, M., Everett, M.: Parallel dynamic graph partitioning for adaptive unstructured meshes. Parallel Distributed Computing 47(2), 102–108 (1997)
Article Google Scholar
Rantakokko, J.: Partitioning strategies for structured multiblock grids. Parallel Computing 26, 1661–1680 (2000)
Article MATH MathSciNet Google Scholar
Steensland, J., Söderberg, S., Thuné, M.: A comparison of partitioning schemes for blockwise parallel samr algorithms. In: Sørevik, T., Manne, F., Moe, R., Gebremedhin, A.H. (eds.) PARA 2000. LNCS, vol. 1947, pp. 160–169. Springer, Heidelberg (2001)
Chapter Google Scholar
Balsara, D., Norton, C.: Highly parallel structured adaptive mesh refinement using parallel language-based approaches. Parallel Computing 27, 37–70 (2001)
Article MATH Google Scholar
Rantakokko, J.: Comparison of parallelization models for structured adaptive mesh refinement. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 615–623. Springer, Heidelberg (2004)
Google Scholar
Ferm, L., Lötsetdt, P.: Space-time adaptive solutions of first order pdes. Journal of Scientific Computing 26(1), 83–110 (2006)
Article MATH MathSciNet Google Scholar
Sun Microsystems: Solaris Memory Placement Optimization and Sun Fire servers (2003), http://www.sun.com/servers/wp/docs/mpo_v7_CUSTOMER.pdf
Teller, P.J.: Tranlation-Lookaside Buffer Consistency. Computer 23(6), 26–36 (1990)
Article Google Scholar
Bircsak, J., Craig, P., Crowell, R., Cvetanovic, Z., Harris, J., Nelson, C.A., Offner, C.D.: Extending OpenMP for NUMA machines. Scientific Programming 8, 163–181 (2000)
Google Scholar
Laudon, J., Lenoski, D.: The SGI Origin: a ccNUMA highly scalable server. In: Proceedings of the 24th annual international symposium on Computer architecture, pp. 241–251. ACM Press, New York (1997)
Google Scholar
Tikir, M.M., Hollingsworth, J.K.: Using Hardware Counters to Automatically Improve Memory Performance. In: SC 2004: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, Washington, DC, USA, p. 46. IEEE Computer Society, Los Alamitos (2004)
Google Scholar
Spiegel, A., an Mey, D.: Hybrid Parallelization with Dynamic Thread Balancing on a ccNUMA System. In: Brorson, M. (ed.) Proceedings of the 6th European Workshop on OpenMP, Royal Institute of Technology (KTH), Sweden, pp. 77–81 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Uppsala University, Box 337, 751 05, Uppsala, Sweden
Markus Nordén, Henrik Löf, Jarmo Rantakokko & Sverker Holmgren

Authors

Markus Nordén
View author publications
You can also search for this author in PubMed Google Scholar
Henrik Löf
View author publications
You can also search for this author in PubMed Google Scholar
Jarmo Rantakokko
View author publications
You can also search for this author in PubMed Google Scholar
Sverker Holmgren
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Matthias S. Mueller Barbara M. Chapman Bronis R. de Supinski Allen D. Malony Michael Voss

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nordén, M., Löf, H., Rantakokko, J., Holmgren, S. (2008). Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds) OpenMP Shared Memory Parallel Programming. IWOMP 2005. Lecture Notes in Computer Science, vol 4315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68555-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-68555-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68554-8
Online ISBN: 978-3-540-68555-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes

OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

Toward Heterogeneous MPI+MPI Programming: Comparison of OpenMP and MPI Shared Memory Models

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes

OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

Toward Heterogeneous MPI+MPI Programming: Comparison of OpenMP and MPI Shared Memory Models

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation