-
Notifications
You must be signed in to change notification settings - Fork 15
[PROPOSAL] [RFC] KSM throttler #168
Comments
This sounds great. I wonder if we need extra events to cater for memory hotplug/removal? |
@sameo KSM acts on the global set of mergeable pages and global parameters. So having multiple input source types may not make sense. Also the VM lifecycle events in that case may not add too much value. If the ksm daemon were to rely on the source just for kicks (completely agnostic of the reason for the kick) and then use the state of the ksm itself like For example if the daemon sees high/unchanging ratio of Also the lifecycle internal to the VM itself (i.e. the application within the VM allocating memory), is an event that cannot be triggered by a source. However it is observable from ksm status as well as memory usage across the VM's active. All of these derivable events can be inferred from sys/kernel/mm/ksm/ and /proc/meminfo |
So the kick is the
Yes. As you know right now we throttle down based on kick timeouts which may eventually converge to a stabilized pages_unshared to pages_sharing ratio. But I think watching for this ratio could help us throttling down quicker and consume less CPU. I'll add something along those lines to the throttling routine, thanks. |
I agree with where @mcastelino is heading - I'm not sure we need a container or VM or possibly anything specific kick interface at all - as KSM solely processes MADV_MERGEABLE pages, and
8000
there are some (not many, but some) user space apps that also register such zones - I think we can get the info the throttler needs by watching the files in I also wonder if this work could live in the kernel itself - but, like many other algorithmic based daemons, it is probably better, at least initially, as a user space app. that will then have much more flexibility and turnaround time to allow algorithm investigation and tweaking. |
Let me try to understand: For phase 1 you're suggesting we don't even have a kick() interface for the input source to call? @grahamwhaley @mcastelino
I think that's a pretty good point. Pushing strongly opinionated policies inside the kernel is usually a tough one, but it could be worth trying. |
After some IRC chatter - one of the crux's of my theory of just use the ksm sysfs files was that we could file watch those files for changes - but, @sameo notes that we cannot use inotify events on sysfs, which sort of scuppers that somewhat as we don't want to poll, and hooking madvise call sounds rather unpleasant. We could start with a very small set of API events, that effectively tell us it is time to poll the KSM sysfs files. |
I'm also researching how we could trace (perf events) |
Implemented by https://github.com/kata-containers/ksm-throttler. |
Definition
Kernel Same-page Merging (KSM) throttling is the process of regulating the KSM daemon by dynamically modifying the KSM sysfs entries, in order to minimize memory duplication as fast as possible while keeping the KSM daemon load low.
KSM throttling currently uses container creation events as its single input.
Problem statement
Today's KSM throttling is part of the Clear Containers proxy code and is a passive component (i.e. it needs to be notified by other parts of the code). That is problematic for 2 main reasons:
KSM throttling depends on a system wide proxy daemon to be running. As Clear Containers is moving towards a one proxy per VM architecture, the proxy code will no longer have a system wide view and will thus no longer be able to trigger the KSM throttling routine based on the overall container creation activity.
The current KSM throttling code is very much Clear Containers proxy specific. It can not be extracted from the code base because it's passive, i.e. it relies on being actually called from other parts of the proxy implementation. It can also not be made a separate standalone component without callers modifying their code to call into a KSM throttling specific API.
Proposal
This proposal is about creating a generic, Clear Containers agnostic and active KSM throttling service: the KSM throttler.
Clear Containers agnostic
The KSM throttler will not be dependent on any Clear Containers piece of code, API or architecture design. As a matter of fact, we believe that a KSM throttler could not only benefit VM based containers but also generic/legacy VM based workloads where the goal would be to minimize memory duplication as quickly as possible.
Active component
The KSM throttler will by default be an active component, checking for different system wide values and settings in order to build informed KSM throttling decisions. In other words KSM throttler will by default not have to be explictly triggered by e.g. a VM based container runtime or proxy but will instead be actively watching for specific information about VM or VM based containers life cycles.
Passive Fallback
When a system can not provide a reliable source of information about VM life cycles, KSM throttler will provide a passive UNIX socket for components like container runtimes to notify it about VM or container specific events (creation, destruction, etc...)
Implementation
The KSM throttler implementation can be split into 2 parts: The throttling algorithm and the input sources handling.
Input sources
The KSM throttler will be able to handle several input sources and one should be able to add a new input source implementation to the source code fairly easily.
In practice, a KSM throttling input source will watch any specific system wide component and will notify the KSM throttler about any new VM or VM based containers life cycle event. The KSM throttler is a server and all input sources are potential clients.
We will use the gRPC protocol between the throttler and its clients, defined by the following
proto
file:Each KSM throttler input source would first register against KSM throttler and then send a stream of events.
KSM Throttling
The initial throttling algorithm will follow the current proxy one, where we throttle KSM up on each VM creation and then progressively throttle it down as long as there are no new VM creation.
Phases
The implementation will follow an incremental process going through a few phases:
Phase 1: virtcontainers compatibility
The virtcontainers KSM throttler input source will be watching the virtcontainers pod filesystem through inotify in order to understand whenever a new Pod is created or destroyed.
Phase 2: Implement a fallback input source [TBD]
The text was updated successfully, but these errors were encountered: