Assessing the Performance of OpenMP Programs on the Knights Landing Architecture

Dirk Schmidl¹⁸,
Bo Wang¹⁸ &
Matthias S. Müller¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10468))

Included in the following conference series:

International Workshop on OpenMP

1063 Accesses

Abstract

Intel’s Knights Landing processor (KNL) is the latest product in the Xeon Phi product line. As a self-hosted system it is the first commercially available many-core architecture which can run unmodified applications. This makes KNL a very interesting option for HPC centers which have to support many different applications including community and ISV codes, where code changes are hard or impossible. Of course running any application and running any application efficiently is not the same, so it remains to investigate how efficient KNL is in executing unmodified codes from x86 servers.

In this work we will investigate the Knights Landing architecture with a focus on its ability to run OpenMP applications efficiently. Kernel benchmarks are used to investigate basic characteristics like memory latency and bandwidth. Furthermore, application-like benchmarks like the NAS parallel benchmarks or SPEC OpenMP benchmarks are used as well as real applications from RWTH Aachen University. The performance is compared to a 2-socket Broadwell system. We consider this a fair comparison as both architectures are state-of-the-art today and both roughly cost the same amount of money and consume the same amount of energy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploiting and Evaluating OpenSHMEM on KNL Architecture

A Comparative Study of Application Performance and Scalability on the Intel Knights Landing Processor

IXPUG: Experiences on Intel Knights Landing at the One Year Mark

References

Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks. Technical report. NASA Ames Research Center (1991)
Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 18:1–18:11, New York. ACM (2009)
Google Scholar
Bull, J.M., O’Neill, D.: A microbenchmark suite for OpenMP 2.0. SIGARCH Comput. Archit. News 29(5), 41–48 (2001)
Article Google Scholar
Bull, J.M., Reid, F., McDonnell, N.: A microbenchmark suite for OpenMP tasks. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 271–274. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_24
Chapter Google Scholar
Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: OpenMP programming on intel xeon phi coprocessors: an early performance comparison. In: Proceedings of the Many-core Applications Research Community Symposium, pp. 38–44, November 2012
Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: International Conference on Parallel Processing, 2009, ICPP 2009, pp. 124–131 (2009)
Google Scholar
Gerndt, A., Sarholz, S., Wolter, M., Mey, D.A., Bischof, C., Kuhlen, T.: Nested OpenMP for efficient computation of 3D critical points in multi-block CFD datasets. In: Proceedings of the ACM/IEEESC 2006 Conference, p. 46, November 2006
Google Scholar
Khronos OpenCL Working Group: The OpenCL Specification, v2.2 (2016)
Google Scholar
McCalpin, J.D.: STREAM: Sustainable Memory Bandwidth in High Performance Computers (1995). Accessed 24 Mar 2016
Google Scholar
McVoy, L., Staelin, C.: lmbench: portable tools for performance analysis. In: Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference, ATEC 1996, pp. 23–23, Berkeley, CA, USA. USENIX Association (1996)
Google Scholar
Müller, M.S., et al.: SPEC OMP2012 — an application benchmark suite for parallel systems using OpenMP. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 223–236. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_17
Chapter Google Scholar
NVIDIA: CUDA C Programming Guide, v8.0 (2016)
Google Scholar
OpenMP ARB: OpenMP Application Program Interface, v. 4.5. http://www.openmp.org
Peters, N., Wang, L.: Dissipation element analysis of scalar fields in turbulence. C. R. Mech. 334, 493–506 (2006)
Article MATH Google Scholar
Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights Landing Edititon. Morgan Kaufmann Publishers Inc., Boston (2016)
Google Scholar
Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.S.: Assessing the performance of OpenMP programs on the intel xeon phi. In: Wolf, F., Mohr, B., Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 547–558. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40047-6_56
Chapter Google Scholar
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair for High Performance Computing, IT Center, RWTH Aachen University, Aachen, Germany
Dirk Schmidl, Bo Wang & Matthias S. Müller

Authors

Dirk Schmidl
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Matthias S. Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dirk Schmidl .

Editor information

Editors and Affiliations

Lawrence Livermore National Laboratory, Livermore, California, USA
Bronis R. de Supinski
Sandia National Laboratories, Albuquerque, New Mexico, USA
Stephen L. Olivier
RWTH Aachen University, Aachen, Germany
Christian Terboven
Stony Brook University, Stony Brook, New York, USA
Barbara M. Chapman
RWTH Aachen University, Aachen, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schmidl, D., Wang, B., Müller, M.S. (2017). Assessing the Performance of OpenMP Programs on the Knights Landing Architecture. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-65578-9_20
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65577-2
Online ISBN: 978-3-319-65578-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics