The Linux scheduler: a decade of wasted cores

JP Lozi, B Lepers, J Funston, F Gaud… - Proceedings of the …, 2016 - dl.acm.org
JP Lozi, B Lepers, J Funston, F Gaud, V Quéma, A Fedorova
Proceedings of the Eleventh European Conference on Computer Systems, 2016dl.acm.org
As a central part of resource management, the OS thread scheduler must maintain the
following, simple, invariant: make sure that ready threads are scheduled on available cores.
As simple as it may seem, we found that this invariant is often broken in Linux. Cores may
stay idle for seconds while ready threads are waiting in runqueues. In our experiments,
these performance bugs caused many-fold performance degradation for synchronization-
heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in …
As a central part of resource management, the OS thread scheduler must maintain the following, simple, invariant: make sure that ready threads are scheduled on available cores. As simple as it may seem, we found that this invariant is often broken in Linux. Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database. The main contribution of this work is the discovery and analysis of these bugs and providing the fixes. Conventional testing techniques and debugging tools are ineffective at confirming or understanding this kind of bugs, because their symptoms are often evasive. To drive our investigation, we built new tools that check for violation of the invariant online and visualize scheduling activity. They are simple, easily portable across kernel versions, and run with a negligible overhead. We believe that making these tools part of the kernel developers' tool belt can help keep this type of bug at bay.
ACM Digital Library