Abstract
Intel’s Knights Landing processor (KNL) is the latest product in the Xeon Phi product line. As a self-hosted system it is the first commercially available many-core architecture which can run unmodified applications. This makes KNL a very interesting option for HPC centers which have to support many different applications including community and ISV codes, where code changes are hard or impossible. Of course running any application and running any application efficiently is not the same, so it remains to investigate how efficient KNL is in executing unmodified codes from x86 servers.
In this work we will investigate the Knights Landing architecture with a focus on its ability to run OpenMP applications efficiently. Kernel benchmarks are used to investigate basic characteristics like memory latency and bandwidth. Furthermore, application-like benchmarks like the NAS parallel benchmarks or SPEC OpenMP benchmarks are used as well as real applications from RWTH Aachen University. The performance is compared to a 2-socket Broadwell system. We consider this a fair comparison as both architectures are state-of-the-art today and both roughly cost the same amount of money and consume the same amount of energy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks. Technical report. NASA Ames Research Center (1991)
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 18:1–18:11, New York. ACM (2009)
Bull, J.M., O’Neill, D.: A microbenchmark suite for OpenMP 2.0. SIGARCH Comput. Archit. News 29(5), 41–48 (2001)
Bull, J.M., Reid, F., McDonnell, N.: A microbenchmark suite for OpenMP tasks. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 271–274. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_24
Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: OpenMP programming on intel xeon phi coprocessors: an early performance comparison. In: Proceedings of the Many-core Applications Research Community Symposium, pp. 38–44, November 2012
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: International Conference on Parallel Processing, 2009, ICPP 2009, pp. 124–131 (2009)
Gerndt, A., Sarholz, S., Wolter, M., Mey, D.A., Bischof, C., Kuhlen, T.: Nested OpenMP for efficient computation of 3D critical points in multi-block CFD datasets. In: Proceedings of the ACM/IEEESC 2006 Conference, p. 46, November 2006
Khronos OpenCL Working Group: The OpenCL Specification, v2.2 (2016)
McCalpin, J.D.: STREAM: Sustainable Memory Bandwidth in High Performance Computers (1995). Accessed 24 Mar 2016
McVoy, L., Staelin, C.: lmbench: portable tools for performance analysis. In: Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference, ATEC 1996, pp. 23–23, Berkeley, CA, USA. USENIX Association (1996)
Müller, M.S., et al.: SPEC OMP2012 — an application benchmark suite for parallel systems using OpenMP. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 223–236. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_17
NVIDIA: CUDA C Programming Guide, v8.0 (2016)
OpenMP ARB: OpenMP Application Program Interface, v. 4.5. http://www.openmp.org
Peters, N., Wang, L.: Dissipation element analysis of scalar fields in turbulence. C. R. Mech. 334, 493–506 (2006)
Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights Landing Edititon. Morgan Kaufmann Publishers Inc., Boston (2016)
Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.S.: Assessing the performance of OpenMP programs on the intel xeon phi. In: Wolf, F., Mohr, B., Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 547–558. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40047-6_56
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Schmidl, D., Wang, B., Müller, M.S. (2017). Assessing the Performance of OpenMP Programs on the Knights Landing Architecture. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-65578-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65577-2
Online ISBN: 978-3-319-65578-9
eBook Packages: Computer ScienceComputer Science (R0)