Abstract
With the advent of Intels second-generation many-core processor (Knights Landing: KNL), high-bandwidth memory (HBM) with potentially five times more bandwidth than existing dynamic random-access memory has become available as a valuable computing resource for high-performance computing (HPC) applications. Therefore, resource management schemes should now be able to consider existing central processing unit cores, conventional main memory, and this newly available HBM to improve the overall system throughput and user response time. In this paper, we present our profiling mechanism and related scheduling policy that analyzes the resource usage patterns of various HPC workloads. By carefully allocating memory-intensive workloads to HBM in KNL, we show that the overall performance of multiple message passing interface workloads can be improved in terms of the execution time and system utilization. We evaluate and verify the effectiveness of our scheme for optimizing the use of HBM by using NAS Parallel Benchmarks.
Similar content being viewed by others
References
Sodani, A.: Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor. In: Conference: 2015 IEEE Hot Chips 27 Symposium (HCS) (2015). https://doi.org/10.1109/HOTCHIPS.2015.7477467
Sodani, A.: Knights Landing Intel Xeon Phi CPU: Path to parallelism with general purpose programming. In: IXPUG ISC 2016 Workshop (2016)
Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming, Knights Landing Edition. Morgan Kaufmann, Burlington (2016)
Antypas, K., et al.: Cori: A Cray XC Pre-exascale System for NERSC. In: Cray User Group Proceedings, Cray (2014)
Peng, I.B., et al.: Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. In: Proc. Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683–692 (2017)
Li, A., et al: Exploring and Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels. In: Proceedings of SC17, Article No. 26. (2017)
Blue Waters 2017 Annual Report.: https://bluewaters.ncsa.illinois.edu/apps/docs/BW_AR_2017_linked.pdf, (2017)
Rho, S., et al.: A Study on Optimal Scheduling Using High-Bandwidth Memory of Knights Landing Processor. In: Proceedings of 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W) (2017)
Mike, P.: An Intro to MCDRAM (High Bandwidth Memory) on Knights Landing. In: Intel HPC Developer Conference (2016)
Jeddeloh, J., Keeth, B.: Hybrid memory cube new DRAM architecture increases density and performance. In: Proc. VLSIT (2012)
Kandalla, K., et al.: Optimizing Cray MPI and SHMEM software stacks for Cray-XC supercomputers based on Intel KNL processors. In: Proceedings of CUG (2016)
Rosales, C., et al.: A comparative study of application performance and scalability on the Intel Knights Landing processor. In: Taufer, M., et al. (eds.) High Performance Computing. ISC High Performance, vol. 9945, pp. 307–318. Springer, Cham (2016)
Bailey, D., Lucas, R., Williams, S.: Performance Tuning of Scientific Applications. CRC Press Taylor & Francis Group, New York (2011)
Weaver, V.: Self-monitoring overhead of the Linux perf event performance counter interface. In: Proceedings of 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 102–111 (2015)
Performance API (PAPI).: http://icl.cs.utk.edu/papi/, (2018)
Pasquale, J., Bittel, B., Kraiman, D.: A static and dynamic workload characterization study of the San Diego Supercomputer Center CRAY X-MP. In: Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 218–219 (1991)
Park, G., et al.: Profiling parallel program execution patterns for effectively leveraging high-bandwidth on-package memory. In: Proceedings of the Communications of the Korean Institute of Information Scientists and Engineers Winter Conference 2016, pp. 42–44 (2016)
NAS Parallel Benchmarks.: https://www.nas.nasa.gov/publications/npb.html, (2018)
Li, S., Raman, K., Sasanka, R.: Enhancing application performance using heterogeneous memory architectures on a many-core platform. In: Proceedings of the 2016 International Conference on High Performance Computing Simulation (HPCSim), pp. 1035–1042 (2016)
Slurm Workload Manager.: http://slurm.schedmd.com/, (2018)
Acknowledgements
This research was supported by Korea Institute of Science and Technology Information (KISTI) (Grant No: K-18-L12-C07-S01) and the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (Grant No: NRF-2017S1A3A2066319).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Park, G., Rho, S., Kim, JS. et al. Towards optimal scheduling policy for heterogeneous memory architecture in many-core system. Cluster Comput 22, 121–133 (2019). https://doi.org/10.1007/s10586-018-2825-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-2825-4