The kernel radar: folios, multi-generational LRU, and Rust

By Jonathan Corbet
January 20, 2022

The kernel community is a busy place, so it is not even remotely possible to write full-length articles about everything that is going on. Other topics may be of interest, but not require a longer treatment. The answer is a collection of short topics covering developments that are on the radar; the selection this time around includes folios, the multi-generational LRU, and Rust in the kernel.

A folio update

Folios have been an active topic since they were first covered here less than one year ago. A folio, recall, is just a container for a struct page that is guaranteed not to be a tail page. It can thus be used to refer to memory, in units of a single page or larger, in a way that is more type-safe and requiring fewer run-time checks than when working directly with page structures. After some extensive discussion, the first set of folio patches was merged for the 5.16 kernel.

A large change of that nature to the memory-management subsystem naturally leads to fears of regressions, but the work in 5.16 appears to have been relatively problem-free. So 5.17 saw another round of folio-related changes, mostly focused on the page cache (which caches file data). In current kernels, the page cache holds, unsurprisingly, pages, but the 4KB page size used on most systems is often far too small to be efficiently managed. When dealing with files of anything but the smallest size, there is value in caching larger chunks at a time. The 5.17 conversion of the page cache to use folios is intended, among other things, to allow the use of "large folios" (a name chosen because the more descriptive "multi-page folios" was a little too long). Large folios might be huge pages, but they don't have to be limited to the huge-page sizes supported by the CPU; the plan is to support any folio size, as long as it is a power of two.

The 5.17 work adds the machinery to support large folios in the page cache, the low-level filesystem-support code, and in the XFS filesystem, but does not actually start using them yet. As Matthew Wilcox said in his pull request: "there may still be places I've overlooked which still have page size assumptions". So the coming development cycle will, presumably, focus on finding any such places so that the transition can happen in 5.18. Meanwhile, the more adventurous among us can enable large folios in 5.17 and help find the remaining sharp edges.

The multi-generational LRU

Another significant memory-management change that has been under development over the last year is the multi-generational LRU, which reworks how the kernel decides which pages to evict when memory is tight. Current kernels use a two-queue system, one each for pages deemed "active" and "inactive". Pages move between the queues based on accesses; when memory is needed, pages are reclaimed off the end of the inactive queue. The multi-generational work generalizes this setup into a larger number of queues, a change that seemingly improves the kernel's ability to identify the pages that are unlikely to be needed in the near future.

When Yu Zhao posted the sixth version of this patch set in early January, he added a request for review and a verdict as to whether it could be merged for 5.17. That sparked a long discussion on the state of this work. As part of that discussion, Michal Hocko (who also did a lot of detailed review of the patches) repeated a theme that has been heard with previous postings: that it would be better to see this work as a series of incremental changes rather than a big addition of new reclaim mechanism:

Changes in the reclaim path are paved with failures and reverts and fine tuning on top of existing fine tuning. The difference from your patchset is that they tend to be much much smaller and go incremental and therefore easier to review.

Jesse Barnes responded that an incremental series might be worse in this case:

I understand the desire for an "incremental approach that gets us from A->B". In the abstract it sounds great. However, with a change like this one, I think it's highly likely that such a path would be littered with regressions both large and small, and would probably be more difficult to reason about than the relatively clean design of MGLRU. On top of that, I don't think we'll get the kind of user feedback we need for something like this *without* merging it.

Linus Torvalds responded to Barnes, saying that this work "is worth going with". Hocko didn't disagree with Barnes, but did note that there are a lot of things needing fixing before the code could be merged in any case.

Zhao, meanwhile, has been actively trying to get supporters of this work to post to the list in favor of its inclusion. Those who responded include Holger Hoffstätte, Shuang Zhai ("the performance improvement is fabulous"), Suleiman Souhlal ("Android on ChromeOS has been using MGLRU for a while now, with great results"), Sofia Trinh, Donald Carr, and Oleksandr Natalenko.

There is clearly some interest in getting this work merged; it is just as clearly not in the cards for 5.17, though. Normally one would expect that a change this fundamental could take a long time yet to get in; given the pressure and the approval from Torvalds, though, it could happen a bit more quickly this time. Merging for 5.18 still seems optimistic, but sometime in 2022 could be a real possibility.

Rust for Linux

The project to make it possible to develop kernel modules in the Rust programming language continues to move forward; the third version of the Rust-support patch set was posted on January 17. A number of changes had been made to keep up with the Rust community and to get this work closer to ready for inclusion.

This version of the patch set supports (and thus requires) the recent 1.58 release of the compiler. The build system is now able to determine automatically whether a suitable Rust toolchain is available for building and, if something is missing, it will tell the developer what is needed. The cover letter notes that a couple of the unstable Rust features required for kernel work are becoming stable in near-future compiler releases. There is, however, still a discouragingly long list of required unstable features.

The series itself starts by increasing the maximum length of symbols that can be managed in the "kallsyms" mechanism. It seems that the name-mangling used by Rust can expand names considerably, to the point that 255 characters is not enough to store some names. Developers will not normally need to see the mangled names, but they will show up in kallsyms and may be surprising. Another preliminary step is to add C helper functions for a long list of things that already look like functions in the kernel — readb() or kmap(), for example — that are actually macros or are inlined. Those cannot be called directly from Rust, so they need to be turned into proper functions first.

Most of the Rust code itself currently appears in two crates. The first, called alloc, deals with memory allocation. The Rust language wasn't built with the idea that code might need to continue when a memory allocation fails; instead, the normal result is an immediate crash. Since crashing in kernel code is considered to be impolite, a modified allocator that can handle failures is required. As a Rust developer would expect, it returns a Result object that contains either a pointer to the allocated memory or an error indication, depending on what happened. Evidently the work to support fallible allocations is meant to go into the upstream Rust library, so the kernel's version of this crate may eventually be able to go away.

The other big crate is called kernel; it contains the rest of the impedance-matching code that makes kernel APIs look like proper Rust interfaces. These provide interfaces for char devices, the clock framework, file structures, file_operations vectors, memory-mapped I/O functions, mutexes, spinlocks, and more. A surprising amount of code is dedicated to the implementation of generic linked lists.

All told, it represents a lot of work toward making it possible to write kernel code in Rust. It is quite a bit of code that, at some point, is going to need to be more widely exercised if it is to progress in useful directions. That, of course, would be helped by getting this support into the mainline kernel where more developers can look at and work with it. Torvalds indicated at the 2021 Maintainers Summit that he expected to merge this work, but there is no indication of when that might happen. The timing is likely to come down to Torvalds and when he thinks that the time has come to open the door to this new language.

Index entries for this article
Kernel	Development tools/Rust
Kernel	Memory management/Folios
Kernel	Memory management/Page replacement algorithms

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 20, 2022 22:49 UTC (Thu) by developer122 (guest, #152928) [Link] (19 responses)

Yeah, as I've heard Brian Cantrill say during a twitter space podcast thing, Rust and it's compiletime ownership-checks really doesn't work well with the concept of linked lists. After all "if you have b-trees, why would you use anything else?" And Rust's ownership checking makes (generically implemented) b-trees easy where as in C or other languages they tend to end up being a horrible buggy mess.

Rust

Posted Jan 20, 2022 23:29 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (7 responses)

The stability list isn't TOO bad once you consider it standing back a few paces.

I count 17 items, 4 are cfg() parameters, to switch off features from the allocator and, in one case, the core Rust library†. That latter is worth a moment's thought: Rust says you can format floating point numbers. Linux, of course, would very much rather you didn't use floating point numbers at all. So, Rust-for-Linux wants to tell the core library that we aren't going to be formatting any floating point numbers, blow up code that tries to do that, that's not valid Linux code. However, ultimately you _could_ do this surgery by hand and in effect "fork" the core library, especially if you knew a real fix was coming later.

2 more are -Z compiler flags. Rust's compiler has flags marked as not being stable with a Z prefix. It's not as though the kernel has never taken a dependency on compiler specific flags before, but clearly having a stable flag is better because it's a social contract not to move this particular feature unexpectedly.

Some of the others have community momentum behind them because they're things most Rust users want, GATs and more const are in that category. If Rust for Linux didn't engage with the main Rust community at all for 12 months, those things have traction and will make progress anyway. On the other hand, there are few applications outside the kernel for some of the compiler internals stabilization that Rust for Linux wants, if they never did this I for example, writing userspace code, would never ever notice.

It overall certainly means I don't expect to be running a Linux kernel with Rust in it in 2022 on my PC. But it also doesn't feel insurmountable, I could imagine reading an LWN piece before the end of the year about the "one big piece" missing, I just can't guess which piece that will be.

† Rust has a core library, which is at the heart of the standard library but must exist anyway. A few things in here are literally mandatory to Rust, e.g. the Drop trait must exist, it needn't be called Drop, but if there isn't one that's not Rust any more, the i32 type must exist, I don't even think you're allowed to call it something else, just too bad you must implement 32-bit signed integers - however a whole lot more are just useful, and not actually needed by Rust itself even if ordinary developers would be sad not to have them.

Rust

Posted Jan 20, 2022 23:43 UTC (Thu) by ejr (subscriber, #51652) [Link] (2 responses)

I am trying to parse both your message as well as the one to which you are responding.

And I'm a FORTRAN (yes, all caps, including WATFIV) / APL / C person. With a background in more formal language systems like ML.

Please define your terminology, It appears as if yet another terminology is a major hurdle for acceptance.

Rust

Posted Jan 21, 2022 0:57 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (1 responses)

I am happy within reason to define terms, and it's likely others could help too, but of course I need to know what you didn't understand.

For example maybe with ML in your background Generic Associated Types are obvious, or maybe not, with GATs Rust can express the idea that some trait can have associated types which are generic. So e.g. today an Iterator has an associated type saying which Item type it iterates over, but that associated type has to be specific, like this is an Iterator over Strings, there are some traits people would like to write where you'd want to express that the associated type has some generic properties but not tie down the specifics. Evidently the Rust for Linux have some use for this feature.

But equally maybe you don't know what Rust's traits are. Traits are similar to the "interfaces" feature in many object oriented languages, in that they express some capability or property common to multiple types. A trait must be explicitly implemented for any particular type, either with the definition of the type itself, or with the definition of the trait, and each such implementation stands alone. So you can be sure that if SomeTrait is implemented for ThisThing, either the author of ThisThing intended that, or the author of SomeTrait or perhaps both, as a result traits have Semantics - there is no risk of their being a mere accident of syntax as with say C++ Concepts.

Maybe you know about Rust stability, or maybe not, in Rust there's a concept of "unstable" features. These features exist, and they work in whatever build of Rust you have, but tomorrow there might be a new Rust version and they're altered, or renamed, or gone. In contrast all the stable features of Rust are promised to still work into the indefinite future, none have been removed since 1.0 in 2015. You have to specifically opt in to having unstable features, and to each specific unstable feature you want. Today Rust for Linux needs several such features. Internally Rust uses some unstable features, but without opting in you can't and probably most people shouldn't, thus it is desirable for Rust for Linux to rely as much as possible on stable features only.

Rust

Posted Jan 21, 2022 14:21 UTC (Fri) by ejr (subscriber, #51652) [Link]

Ah, I thought GAT was a misspelling of GADT. And traits sound more like type classes... So essentially this is another version of ML module signatures / Haskell type classes. I'm guessing it looks more friendly to folks coming from C/C++.

Stability is pre-standardization like C/C++ attributes, or would be if there were multiple compilers out there. (gcc was on its way at some point, right?) Many mostly-single-vendor languages have stable v. implementation gizmos. At least this term is fairly well shared.

Rust isn't useful in my daily life (heavily shared memory structures being changed rapidly), so I haven't really poked at it.

Rust

Posted Jan 21, 2022 19:25 UTC (Fri) by ballombe (subscriber, #9523) [Link] (1 responses)

Is there an usable subset of rust without object metadata ?

Rust

Posted Jan 26, 2022 18:37 UTC (Wed) by iq-0 (subscriber, #36655) [Link]

What do you mean with object metadata? Rust doesn't have RTTI if that's what you mean. Aside from DWARF information it only has vtables for trait objects that are used in the code (bare minimum, those things are basically structs with function pointers like one would use in C for this same problem). If you're referring to the name mangling (the loooong type signatures) those have more to do with the way Rust solves diamond dependency conflicts and the liberal usage of templated types.

Rust

Posted Jan 22, 2022 1:02 UTC (Sat) by plugwash (subscriber, #29694) [Link] (1 responses)

"However, ultimately you _could_ do this surgery by hand and in effect "fork" the core library, especially if you knew a real fix was coming later."

The rust standard library uses and will probably always use features that will not be part of stable rust. So forking the standard library doesn't really help you with future-proofing your code.

Rust

Posted Jan 22, 2022 21:32 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

What I'm getting at is that your fork can literally rip out the code that does floating point math. Rust's actual standard library won't do that, but it will probably some day take that cfg() parameter to turn off floating point math or agree some other way forward. Meanwhile Rust for Linux gets a library it can ship in Linux.

It's future proofed in that the assumption is some day Rust will have a way to disable or sidestep this, and at that point Rust for Linux can just ship the normal core library (in this respect).

The fact Rust's standard library relies on Rust's nightly features is orthogonal, this configuration parameter isn't relying on yet-to-stabilise feature, it's just that the code in core wants to do floating point maths, and Linux doesn't want that to be a possibility. In that specific case just ripping out the offending code is an effective solution, you could do it once for each Linux release and while somewhat tiresome it's not unmanageable.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 3:07 UTC (Fri) by willy (subscriber, #9762) [Link] (10 responses)

Ok, I'll bite ... How do you efficiently implement an LRU with a B-tree? Every time you access an element (let's say an inode) in the inode cache, you remove it from wherever it currently is in the B-tree and move it to the highest unused index. That's two O(log n) operations, versus a linked list, which is two O(1) operations.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 4:40 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

You can use heaps, not full B-trees. That actually would be interesting to measure, it's quite possible that not having to chase the pointers might speed them up.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 9:00 UTC (Fri) by ldearquer (guest, #137451) [Link] (8 responses)

For the linked list, wouldn't it require O(n) finding the node? Then place it on the highest index O(1)? So you would have O(n) (+ O(1)) vs 2·O(log n)

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 12:11 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (7 responses)

Usually Linux uses intrusive linked lists. They allow the same node to be part of multiple data structures, so you find the node in a different data structure (for example an array or hash table) and juggle the pointers in the node to move it to the end of the LRU list.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 12:36 UTC (Fri) by ldearquer (guest, #137451) [Link] (6 responses)

I see, thanks for clarifying :)

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 12:45 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (5 responses)

I'll add that Rust in general does not support intrusive data structures very well, because it is not very much friend with data that can be reached in different ways. You would have to wrap it with a reference count (Rc or Arc) and with either run-time borrow checking or a mutex (respectively RefCell and Mutex).

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 17:41 UTC (Fri) by walters (subscriber, #7396) [Link] (4 responses)

There is https://crates.io/crates/intrusive-collections for example.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 18:10 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (3 responses)

Sure, I only said it doesn't support them "very well". If you check https://docs.rs/intrusive-collections/0.9.3/intrusive_col..., having the same object in many lists (which is a major advantage of intrusive collections) requires reference counting.

This is enough of a disadvantage that the same crate includes an unsafe primitive to avoid this (https://docs.rs/intrusive-collections/0.9.3/intrusive_col...).

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 20:45 UTC (Fri) by atnot (subscriber, #124910) [Link] (2 responses)

I'm not sure that's really much less ergonomic than C? You still have the same issue there. Once you put things into multiple collections you need some way of knowing whether you should free items when removing them. That's true regardless of whether the compiler notices it.

If you don't know whether it's the last reference, you need some kind of reference count and if you do you'll need to prove it, either to yourself or preferably some program. It's not much different.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 25, 2022 4:15 UTC (Tue) by artem (subscriber, #51262) [Link] (1 responses)

> Once you put things into multiple collections you need some way of knowing whether you should free items when removing them.

> If you don't know whether it's the last reference, you need some kind of reference count

With multiple intrusive lists, you can declare some of them owning and some of them non-owning.

When a thing is not on any of the owning lists, it's removed from the rest and dropped.

No reference counting necessary.

How would one do that in Rust?

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 25, 2022 9:42 UTC (Tue) by atnot (subscriber, #124910) [Link]

Oh, I hadn't considered that you could mutate a lists through another one...

I don't immediately see any reason why it shouldn't be possible. Rust's weak references have similar semantics albeit still require reference counting because it can't just go and remove the remaining references like with a list. I don't think it's natively supported by the intrusive-collections crate mentioned above though. But you should be able to express it with a safe interface.

Evidently there doesn't seem to be a lot of demand for something like that from existing rust users though. intrusive-collections is already not exactly widely used. I've personally never found a reason to use it despite looking for excuses, there's just too many specialized high quality data structure libraries to justify it. Maybe some day.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 12:30 UTC (Fri) by eru (subscriber, #2753) [Link] (7 responses)

The Rust language wasn't built with the idea that code might need to continue when a memory allocation fails; instead, the normal result is an immediate crash.

A surprising feature in a language that has been touted as a C replacement for low-level programming. Of course, when you are out of memory, there typically are no good options, but a language for low-level programming must allow the programmer to decide what happens next.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 14:22 UTC (Fri) by ejr (subscriber, #51652) [Link]

Modula-3 tried really, really hard to come up with generalizable semantics to apply when allocation fails. I don't believe they ever did.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 17:43 UTC (Fri) by walters (subscriber, #7396) [Link]

The article phrased this poorly. The Rust *language* is independent of the default standard library. Plenty of use cases (including Rust-for-Linux) use "no-std" which fully supports writing code with fallible allocations. Plenty more at e.g. https://doc.rust-lang.org/stable/embedded-book/intro/no-s...

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 21, 2022 18:03 UTC (Fri) by atnot (subscriber, #124910) [Link]

It's not really accurate that the language wasn't built with the Idea in mind. Just like C, Rust does not depend on any kind of allocator, fallible or not. All of the difficulties here are just from the desire to reuse a userspace library (alloc) in the kernel. They certainly could have just written their own allocator api, list types, etc. just like the kernel does, but since the standard library developers were willing to support their use case it was just more convenient to share.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 23, 2022 3:01 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

Well, let's compare to the (userspace) alternatives:

1. In a managed language, like Java, you get some sort of "out of memory" exception. Handling these exceptions safely is complicated and error-prone, and the official documentation will often encourage the programmer to just "let it crash" instead of trying to deal with the problem gracefully. In most cases, these languages will try to reclaim previously allocated memory with the garbage collector before throwing these exceptions.
2. In C++, std::bad_alloc is thrown, which is like the managed case except that stack unwinding will cause destructors to run (in the managed case, finalizers *might* run, but there is no guarantee of when they get called). If any of those destructors tries to allocate any memory, for any reason, it might cause a second std::bad_alloc to get thrown, and if a destructor throws an exception while another exception is already pending, the runtime gives up and calls std::terminate. Therefore, if you want to handle fallible allocations, every destructor in your entire program must be in on the joke.
3. In C, malloc returns a null pointer. If you forget to check for it, undefined behavior occurs (but in practice, you probably just segfault most of the time). If you remembered to check for it, you can handle it in whatever way you like, but in most cases you either return an error to your caller, or call longjmp(3) on a pre-allocated jmp_buf to go back to the main event loop or some other top-level scope that can reasonably figure out what to do next.
4. Regardless of language, it's possible for the OS to lie to you and give you memory which doesn't actually exist, then kill you when you try to use it. Linux actually does this,* and I believe it's common in other modern OSes as well. Where this functionality is enabled, any program can crash upon the system running out of memory, and there's nothing the programmer can reasonably do about it.

Sure, discount (1) all you like, but the practical reality is that running out of memory is really hard to handle safely when you're running in userspace, regardless of whether you're a high-level language or a low-level language.

* The OOM killer is more complicated than what I have described here.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Jan 23, 2022 10:07 UTC (Sun) by farnz (subscriber, #17727) [Link]

As a side note, the Rust team are working on providing a decent API for fallible allocations. The raw Allocator trait that's going to be the interface between allocators and the rest of the language only has fallible functions that may allocate. The only function in that API that must succeed is free.

Then, there's effort ongoing to support fallible allocation in container APIs, with its own tracking issue.

There's a lot of effort going in to making this a supportable API for the long term; the key "odd" decision Rust seems to be making here is that instead of requiring every operation that could allocate to cope with the risk of allocation failure, there will be a mechanism (try_reserve on collections, for example) that guarantees that you can do a certain number of operations without allocating, and that mechanism can fail. It's then up to the users of the APIs to ensure that memory is either reserved up-front, or to take the risk of surprise allocation failure.

And this matches the underlying observation you've made; it's incredibly hard to handle allocation failure if it can happen at any time, and thus the entire application needs to either be bug-free in handling the rare error case (allocation failure), which in C++ means that everything has to meet the strong exception safety guarantee, or you need to confine allocation failures to a few places where you have hardened your code.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Feb 6, 2022 3:02 UTC (Sun) by HelloWorld (guest, #56129) [Link]

> 2. In C++, std::bad_alloc is thrown, which is like the managed case except that stack unwinding will cause destructors to run (in the managed case, finalizers *might* run, but there is no guarantee of when they get called). If any of those destructors tries to allocate any memory, for any reason, it might cause a second std::bad_alloc to get thrown, and if a destructor throws an exception while another exception is already pending, the runtime gives up and calls std::terminate. Therefore, if you want to handle fallible allocations, every destructor in your entire program must be in on the joke.

Well, most destructors don't require allocation but do free up some memory. This means that when a bad_alloc is thrown, there's a pretty good chance that by the time you reach a destructor that allocates, some memory has already been freed by a destructor that ran before, and so your allocation may very well succeed. There are cases where it's better to try to recover from an allocation failure and sometimes fail than to not try at all.

Also, you might be able to allocate whatever memory you need to run the destructor in the constructor, thus avoiding the need to allocate in the destructor.

The kernel radar: folios, multi-generational LRU, and Rust

Posted Mar 3, 2022 18:19 UTC (Thu) by jd (guest, #26381) [Link]

There are a variety of approaches to out-of-memory issues in different languages, and there has been quite an interesting on-and-off debate in computing about fault-intolerant languages (which can be useful because a program isn't trying to limp on with an uncertain state).

In a microkernel, where you might very well want a module to crash cleanly and get restarted, a fault-intolerant language would seem to make a lot of sense. Rust might well be ideal for writing modules in the L4 world. I'm not sure how it makes sense in Linux, although if the primary kernel developers say it does then there's an excellent chance it does even if I can't see it.

There are other languages that might have value (or even more value) in the kernel, but they're either hopelessly obscure or too difficult to find a 64-bit compiler for, and I seriously doubt there would be interest in yet another language until Rust succeeds/fails, the reasons are known and there are people who might credibly actually use such a provision.