The ghost in the machine: Observing the effects of kernel operation on parallel application performance

A Nataraj, A Morris, AD Malony, M Sottile… - Proceedings of the 2007 …, 2007 - dl.acm.org
A Nataraj, A Morris, AD Malony, M Sottile, P Beckman
Proceedings of the 2007 ACM/IEEE conference on Supercomputing, 2007dl.acm.org
The performance of a parallel application on a scalable HPC system is determined by user-
level execution of the application code and system-level (OS kernel) operations. To
understand the influences of system-level factors on application performance, the
measurement of OS kernel activities is key. We describe a technology to observe kernel
actions and make this information available to application-level performance measurement
tools. The benefits of merged application and OS performance information and its use in …
The performance of a parallel application on a scalable HPC system is determined by user-level execution of the application code and system-level (OS kernel) operations. To understand the influences of system-level factors on application performance, the measurement of OS kernel activities is key. We describe a technology to observe kernel actions and make this information available to application-level performance measurement tools. The benefits of merged application and OS performance information and its use in parallel performance analysis are demonstrated, both for profiling and tracing methodologies. In particular, we focus on the problem of kernel noise assessment as a stress test of the approach. We show new results for characterizing noise and introduce new techniques for evaluating noise interference and its effects on application execution. Our kernel measurement and noise analysis technologies are being developed as part of Linux OS environments for scalable parallel systems.
ACM Digital Library