Proactive reclaim for tiered memory and more
Bueso started by suggesting the addition of a per-node knob that would enable proactive reclaim; an administrator would write a number indicating the amount of memory that should be reclaimed, and the kernel would attempt to make it happen. It might also be possible, he said, to extend the debugfs knob added by the multi-generational LRU patches rather than adding a new knob. Michal Hocko opposed that latter idea, though, saying that he did not want to make anything in debugfs into an API that would have to be maintained.
Instead, Hocko said, a knob of this sort should be put into sysfs. There are two ideas for how this control could work: there could be a single knob that would accept a mask indicating which nodes to target for reclaim, or there could be a per-node knob as described by Bueso. Hocko likes the per-node knob idea better, since it provides better control to the administrator. Johannes Weiner said that he has tried to add a similar sort of knob to the control-group memory controller; it would accept a count of pages to reclaim from a given group. That controller does round-robin reclaim across the processes contained within the group, which might be good enough, he said. He suggested testing this mechanism on tiered-memory systems to see how it works.
Bueso asked whether that sort of interface can be counted on to work in the future; not every system uses control groups in this way, and control at the global level might be handled differently. Weiner said that users want all of the features in both the global and control-group settings, so there should not be any divergence there.
Another attendee pointed out a couple of other use cases for proactive reclaim. Migration of virtual machines will go faster if there are fewer pages to copy, so administrators would like to be able to force a virtual machine to reclaim as much memory as possible before the migration begins. The virtual machine can report which pages have been freed to the hypervisor, and those pages can be left out of the copy to the new host. A similar use case is suspend-to-disk, which will happen more quickly if there are a lot of free pages that need not be written to persistent storage.
Bueso turned the topic to testing of proactive-reclaim mechanisms; there are a lot of ideas going around, he said, but not a lot of numbers showing how well they actually work. For example, he likes the hot-page selection algorithm that is part of the tiered-memory work, but there is only one benchmark result that gives any information on its performance. The minimal approach to benchmarking appears to be the standard for this kind of work, he said, and that worries him.
He continued with a request for an easier way to subject a patch set to a variety of workloads. He has been hacking on MMTests toward that end, trying to get an indication of just when a workload starts to push pages out of DRAM and into a slower memory tier. That helps to know whether the tiering algorithm is actually working, he said. But he would like to find a way to add tests that exercise the memory-management subsystem in ways beyond just consuming lots of RAM.
As the session wound down, he also said that he would like a way to export
the kernel's view of the various memory tiers to user space. The consensus
seemed to be that a sysfs file should be added for that purpose.
Index entries for this article | |
---|---|
Kernel | Memory management/Tiered-memory systems |
Conference | Storage, Filesystem, Memory-Management and BPF Summit/2022 |
Posted May 16, 2022 5:39 UTC (Mon)
by neilbrown (subscriber, #359)
[Link] (2 responses)
echo 3 > /proc/sys/vm/drop_caches
??
Posted May 16, 2022 14:10 UTC (Mon)
by gpiccoli (subscriber, #109098)
[Link]
Not sure if Bueso's approach would be similar or could share concepts...
Thanks for the good summary!
Posted May 20, 2022 5:46 UTC (Fri)
by righiandr (subscriber, #34187)
[Link]
For example you may want to use this feature to apply a more aggressive memory reclaim to low priority cgroups (i.e. background apps) and a less aggressive memory reclaim policy to high priority cgroups (i.e. foreground apps). The same concept can be applied to containers, cloud instances, etc.
I think that having such control in user-space on a per-cgroup basis can bring some real benefits to a lot of different scenarios.
Proactive reclaim for tiered memory and more
Proactive reclaim for tiered memory and more
https://lwn.net/Articles/833511/
Proactive reclaim for tiered memory and more