The kernel radar: folios, multi-generational LRU, and Rust
A folio update
Folios have been an active topic since they were first covered here less than one year ago. A folio, recall, is just a container for a struct page that is guaranteed not to be a tail page. It can thus be used to refer to memory, in units of a single page or larger, in a way that is more type-safe and requiring fewer run-time checks than when working directly with page structures. After some extensive discussion, the first set of folio patches was merged for the 5.16 kernel.
A large change of that nature to the memory-management subsystem naturally leads to fears of regressions, but the work in 5.16 appears to have been relatively problem-free. So 5.17 saw another round of folio-related changes, mostly focused on the page cache (which caches file data). In current kernels, the page cache holds, unsurprisingly, pages, but the 4KB page size used on most systems is often far too small to be efficiently managed. When dealing with files of anything but the smallest size, there is value in caching larger chunks at a time. The 5.17 conversion of the page cache to use folios is intended, among other things, to allow the use of "large folios" (a name chosen because the more descriptive "multi-page folios" was a little too long). Large folios might be huge pages, but they don't have to be limited to the huge-page sizes supported by the CPU; the plan is to support any folio size, as long as it is a power of two.
The 5.17 work adds the machinery to support large folios in the page
cache, the low-level filesystem-support code, and in the XFS filesystem,
but does not actually start using them yet. As Matthew Wilcox said in his pull
request: "there may still be places I've overlooked which still
have page size assumptions
". So the coming development cycle will,
presumably, focus on finding any such places so that the transition can
happen in 5.18. Meanwhile, the more adventurous among us can enable
large folios in 5.17 and help find the remaining sharp edges.
The multi-generational LRU
Another significant memory-management change that has been under development over the last year is the multi-generational LRU, which reworks how the kernel decides which pages to evict when memory is tight. Current kernels use a two-queue system, one each for pages deemed "active" and "inactive". Pages move between the queues based on accesses; when memory is needed, pages are reclaimed off the end of the inactive queue. The multi-generational work generalizes this setup into a larger number of queues, a change that seemingly improves the kernel's ability to identify the pages that are unlikely to be needed in the near future.
When Yu Zhao posted the sixth version of this patch set in early January, he added a request for review and a verdict as to whether it could be merged for 5.17. That sparked a long discussion on the state of this work. As part of that discussion, Michal Hocko (who also did a lot of detailed review of the patches) repeated a theme that has been heard with previous postings: that it would be better to see this work as a series of incremental changes rather than a big addition of new reclaim mechanism:
Changes in the reclaim path are paved with failures and reverts and fine tuning on top of existing fine tuning. The difference from your patchset is that they tend to be much much smaller and go incremental and therefore easier to review.
Jesse Barnes responded that an incremental series might be worse in this case:
I understand the desire for an "incremental approach that gets us from A->B". In the abstract it sounds great. However, with a change like this one, I think it's highly likely that such a path would be littered with regressions both large and small, and would probably be more difficult to reason about than the relatively clean design of MGLRU. On top of that, I don't think we'll get the kind of user feedback we need for something like this *without* merging it.
Linus Torvalds responded
to Barnes, saying that this work "is worth going with
". Hocko
didn't
disagree with Barnes, but did note that there are a lot of things
needing fixing before the code could be merged in any case.
Zhao, meanwhile, has been actively trying to get supporters of this work to
post to the list in favor of its inclusion. Those who responded include Holger
Hoffstätte,
Shuang
Zhai ("the performance improvement is fabulous
"),
Suleiman
Souhlal ("Android on ChromeOS has been using MGLRU for a while
now, with great results
"), Sofia
Trinh, Donald
Carr, and Oleksandr
Natalenko.
There is clearly some interest in getting this work merged; it is just as clearly not in the cards for 5.17, though. Normally one would expect that a change this fundamental could take a long time yet to get in; given the pressure and the approval from Torvalds, though, it could happen a bit more quickly this time. Merging for 5.18 still seems optimistic, but sometime in 2022 could be a real possibility.
Rust for Linux
The project to make it possible to develop kernel modules in the Rust programming language continues to move forward; the third version of the Rust-support patch set was posted on January 17. A number of changes had been made to keep up with the Rust community and to get this work closer to ready for inclusion.
This version of the patch set supports (and thus requires) the recent 1.58 release of the compiler. The build system is now able to determine automatically whether a suitable Rust toolchain is available for building and, if something is missing, it will tell the developer what is needed. The cover letter notes that a couple of the unstable Rust features required for kernel work are becoming stable in near-future compiler releases. There is, however, still a discouragingly long list of required unstable features.
The series itself starts by increasing the maximum length of symbols that can be managed in the "kallsyms" mechanism. It seems that the name-mangling used by Rust can expand names considerably, to the point that 255 characters is not enough to store some names. Developers will not normally need to see the mangled names, but they will show up in kallsyms and may be surprising. Another preliminary step is to add C helper functions for a long list of things that already look like functions in the kernel — readb() or kmap(), for example — that are actually macros or are inlined. Those cannot be called directly from Rust, so they need to be turned into proper functions first.
Most of the Rust code itself currently appears in two crates. The first, called alloc, deals with memory allocation. The Rust language wasn't built with the idea that code might need to continue when a memory allocation fails; instead, the normal result is an immediate crash. Since crashing in kernel code is considered to be impolite, a modified allocator that can handle failures is required. As a Rust developer would expect, it returns a Result object that contains either a pointer to the allocated memory or an error indication, depending on what happened. Evidently the work to support fallible allocations is meant to go into the upstream Rust library, so the kernel's version of this crate may eventually be able to go away.
The other big crate is called kernel; it contains the rest of the impedance-matching code that makes kernel APIs look like proper Rust interfaces. These provide interfaces for char devices, the clock framework, file structures, file_operations vectors, memory-mapped I/O functions, mutexes, spinlocks, and more. A surprising amount of code is dedicated to the implementation of generic linked lists.
All told, it represents a lot of work toward making it possible to write
kernel code in Rust. It is quite a bit of code that, at some point, is
going to need to be more widely exercised if it is to progress in useful
directions. That, of course, would be helped by getting this support into
the mainline kernel where more developers can look at and work with it.
Torvalds indicated at the 2021 Maintainers
Summit that he expected to merge this work, but
there is no indication of when that might happen. The
timing is likely to come down to Torvalds and when he thinks that the time has
come to open the door to this new language.
Index entries for this article | |
---|---|
Kernel | Development tools/Rust |
Kernel | Memory management/Folios |
Kernel | Memory management/Page replacement algorithms |
Posted Jan 20, 2022 22:49 UTC (Thu)
by developer122 (guest, #152928)
[Link] (19 responses)
Posted Jan 20, 2022 23:29 UTC (Thu)
by tialaramex (subscriber, #21167)
[Link] (7 responses)
I count 17 items, 4 are cfg() parameters, to switch off features from the allocator and, in one case, the core Rust library†. That latter is worth a moment's thought: Rust says you can format floating point numbers. Linux, of course, would very much rather you didn't use floating point numbers at all. So, Rust-for-Linux wants to tell the core library that we aren't going to be formatting any floating point numbers, blow up code that tries to do that, that's not valid Linux code. However, ultimately you _could_ do this surgery by hand and in effect "fork" the core library, especially if you knew a real fix was coming later.
2 more are -Z compiler flags. Rust's compiler has flags marked as not being stable with a Z prefix. It's not as though the kernel has never taken a dependency on compiler specific flags before, but clearly having a stable flag is better because it's a social contract not to move this particular feature unexpectedly.
Some of the others have community momentum behind them because they're things most Rust users want, GATs and more const are in that category. If Rust for Linux didn't engage with the main Rust community at all for 12 months, those things have traction and will make progress anyway. On the other hand, there are few applications outside the kernel for some of the compiler internals stabilization that Rust for Linux wants, if they never did this I for example, writing userspace code, would never ever notice.
It overall certainly means I don't expect to be running a Linux kernel with Rust in it in 2022 on my PC. But it also doesn't feel insurmountable, I could imagine reading an LWN piece before the end of the year about the "one big piece" missing, I just can't guess which piece that will be.
† Rust has a core library, which is at the heart of the standard library but must exist anyway. A few things in here are literally mandatory to Rust, e.g. the Drop trait must exist, it needn't be called Drop, but if there isn't one that's not Rust any more, the i32 type must exist, I don't even think you're allowed to call it something else, just too bad you must implement 32-bit signed integers - however a whole lot more are just useful, and not actually needed by Rust itself even if ordinary developers would be sad not to have them.
Posted Jan 20, 2022 23:43 UTC (Thu)
by ejr (subscriber, #51652)
[Link] (2 responses)
And I'm a FORTRAN (yes, all caps, including WATFIV) / APL / C person. With a background in more formal language systems like ML.
Please define your terminology, It appears as if yet another terminology is a major hurdle for acceptance.
Posted Jan 21, 2022 0:57 UTC (Fri)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
For example maybe with ML in your background Generic Associated Types are obvious, or maybe not, with GATs Rust can express the idea that some trait can have associated types which are generic. So e.g. today an Iterator has an associated type saying which Item type it iterates over, but that associated type has to be specific, like this is an Iterator over Strings, there are some traits people would like to write where you'd want to express that the associated type has some generic properties but not tie down the specifics. Evidently the Rust for Linux have some use for this feature.
But equally maybe you don't know what Rust's traits are. Traits are similar to the "interfaces" feature in many object oriented languages, in that they express some capability or property common to multiple types. A trait must be explicitly implemented for any particular type, either with the definition of the type itself, or with the definition of the trait, and each such implementation stands alone. So you can be sure that if SomeTrait is implemented for ThisThing, either the author of ThisThing intended that, or the author of SomeTrait or perhaps both, as a result traits have Semantics - there is no risk of their being a mere accident of syntax as with say C++ Concepts.
Maybe you know about Rust stability, or maybe not, in Rust there's a concept of "unstable" features. These features exist, and they work in whatever build of Rust you have, but tomorrow there might be a new Rust version and they're altered, or renamed, or gone. In contrast all the stable features of Rust are promised to still work into the indefinite future, none have been removed since 1.0 in 2015. You have to specifically opt in to having unstable features, and to each specific unstable feature you want. Today Rust for Linux needs several such features. Internally Rust uses some unstable features, but without opting in you can't and probably most people shouldn't, thus it is desirable for Rust for Linux to rely as much as possible on stable features only.
Posted Jan 21, 2022 14:21 UTC (Fri)
by ejr (subscriber, #51652)
[Link]
Stability is pre-standardization like C/C++ attributes, or would be if there were multiple compilers out there. (gcc was on its way at some point, right?) Many mostly-single-vendor languages have stable v. implementation gizmos. At least this term is fairly well shared.
Rust isn't useful in my daily life (heavily shared memory structures being changed rapidly), so I haven't really poked at it.
Posted Jan 21, 2022 19:25 UTC (Fri)
by ballombe (subscriber, #9523)
[Link] (1 responses)
Posted Jan 26, 2022 18:37 UTC (Wed)
by iq-0 (subscriber, #36655)
[Link]
Posted Jan 22, 2022 1:02 UTC (Sat)
by plugwash (subscriber, #29694)
[Link] (1 responses)
The rust standard library uses and will probably always use features that will not be part of stable rust. So forking the standard library doesn't really help you with future-proofing your code.
Posted Jan 22, 2022 21:32 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link]
It's future proofed in that the assumption is some day Rust will have a way to disable or sidestep this, and at that point Rust for Linux can just ship the normal core library (in this respect).
The fact Rust's standard library relies on Rust's nightly features is orthogonal, this configuration parameter isn't relying on yet-to-stabilise feature, it's just that the code in core wants to do floating point maths, and Linux doesn't want that to be a possibility. In that specific case just ripping out the offending code is an effective solution, you could do it once for each Linux release and while somewhat tiresome it's not unmanageable.
Posted Jan 21, 2022 3:07 UTC (Fri)
by willy (subscriber, #9762)
[Link] (10 responses)
Posted Jan 21, 2022 4:40 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jan 21, 2022 9:00 UTC (Fri)
by ldearquer (guest, #137451)
[Link] (8 responses)
Posted Jan 21, 2022 12:11 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link] (7 responses)
Posted Jan 21, 2022 12:36 UTC (Fri)
by ldearquer (guest, #137451)
[Link] (6 responses)
Posted Jan 21, 2022 12:45 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link] (5 responses)
Posted Jan 21, 2022 17:41 UTC (Fri)
by walters (subscriber, #7396)
[Link] (4 responses)
Posted Jan 21, 2022 18:10 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link] (3 responses)
This is enough of a disadvantage that the same crate includes an unsafe primitive to avoid this (https://docs.rs/intrusive-collections/0.9.3/intrusive_col...).
Posted Jan 21, 2022 20:45 UTC (Fri)
by atnot (subscriber, #124910)
[Link] (2 responses)
If you don't know whether it's the last reference, you need some kind of reference count and if you do you'll need to prove it, either to yourself or preferably some program. It's not much different.
Posted Jan 25, 2022 4:15 UTC (Tue)
by artem (subscriber, #51262)
[Link] (1 responses)
> If you don't know whether it's the last reference, you need some kind of reference count
With multiple intrusive lists, you can declare some of them owning and some of them non-owning.
When a thing is not on any of the owning lists, it's removed from the rest and dropped.
No reference counting necessary.
How would one do that in Rust?
Posted Jan 25, 2022 9:42 UTC (Tue)
by atnot (subscriber, #124910)
[Link]
I don't immediately see any reason why it shouldn't be possible. Rust's weak references have similar semantics albeit still require reference counting because it can't just go and remove the remaining references like with a list. I don't think it's natively supported by the intrusive-collections crate mentioned above though. But you should be able to express it with a safe interface.
Evidently there doesn't seem to be a lot of demand for something like that from existing rust users though. intrusive-collections is already not exactly widely used. I've personally never found a reason to use it despite looking for excuses, there's just too many specialized high quality data structure libraries to justify it. Maybe some day.
Posted Jan 21, 2022 12:30 UTC (Fri)
by eru (subscriber, #2753)
[Link] (7 responses)
A surprising feature in a language that has been touted as a C replacement for low-level programming. Of course, when you are out of memory, there typically are no good options, but a language for low-level programming must allow the programmer to decide what happens next.
Posted Jan 21, 2022 14:22 UTC (Fri)
by ejr (subscriber, #51652)
[Link]
Posted Jan 21, 2022 17:43 UTC (Fri)
by walters (subscriber, #7396)
[Link]
Posted Jan 21, 2022 18:03 UTC (Fri)
by atnot (subscriber, #124910)
[Link]
Posted Jan 23, 2022 3:01 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
1. In a managed language, like Java, you get some sort of "out of memory" exception. Handling these exceptions safely is complicated and error-prone, and the official documentation will often encourage the programmer to just "let it crash" instead of trying to deal with the problem gracefully. In most cases, these languages will try to reclaim previously allocated memory with the garbage collector before throwing these exceptions.
Sure, discount (1) all you like, but the practical reality is that running out of memory is really hard to handle safely when you're running in userspace, regardless of whether you're a high-level language or a low-level language.
* The OOM killer is more complicated than what I have described here.
Posted Jan 23, 2022 10:07 UTC (Sun)
by farnz (subscriber, #17727)
[Link]
As a side note, the Rust team are working on providing a decent API for fallible allocations. The raw Allocator trait that's going to be the interface between allocators and the rest of the language only has fallible functions that may allocate. The only function in that API that must succeed is free.
Then, there's effort ongoing to support fallible allocation in container APIs, with its own tracking issue.
There's a lot of effort going in to making this a supportable API for the long term; the key "odd" decision Rust seems to be making here is that instead of requiring every operation that could allocate to cope with the risk of allocation failure, there will be a mechanism (try_reserve on collections, for example) that guarantees that you can do a certain number of operations without allocating, and that mechanism can fail. It's then up to the users of the APIs to ensure that memory is either reserved up-front, or to take the risk of surprise allocation failure.
And this matches the underlying observation you've made; it's incredibly hard to handle allocation failure if it can happen at any time, and thus the entire application needs to either be bug-free in handling the rare error case (allocation failure), which in C++ means that everything has to meet the strong exception safety guarantee, or you need to confine allocation failures to a few places where you have hardened your code.
Posted Feb 6, 2022 3:02 UTC (Sun)
by HelloWorld (guest, #56129)
[Link]
Well, most destructors don't require allocation but do free up some memory. This means that when a bad_alloc is thrown, there's a pretty good chance that by the time you reach a destructor that allocates, some memory has already been freed by a destructor that ran before, and so your allocation may very well succeed. There are cases where it's better to try to recover from an allocation failure and sometimes fail than to not try at all.
Also, you might be able to allocate whatever memory you need to run the destructor in the constructor, thus avoiding the need to allocate in the destructor.
Posted Mar 3, 2022 18:19 UTC (Thu)
by jd (guest, #26381)
[Link]
In a microkernel, where you might very well want a module to crash cleanly and get restarted, a fault-intolerant language would seem to make a lot of sense. Rust might well be ideal for writing modules in the L4 world. I'm not sure how it makes sense in Linux, although if the primary kernel developers say it does then there's an excellent chance it does even if I can't see it.
There are other languages that might have value (or even more value) in the kernel, but they're either hopelessly obscure or too difficult to find a 64-bit compiler for, and I seriously doubt there would be interest in yet another language until Rust succeeds/fails, the reasons are known and there are people who might credibly actually use such a provision.
The kernel radar: folios, multi-generational LRU, and Rust
Rust
Rust
Rust
Rust
Rust
Rust
Rust
Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
I'll add that Rust in general does not support intrusive data structures very well, because it is not very much friend with data that can be reached in different ways. You would have to wrap it with a reference count (Rc or Arc) and with either run-time borrow checking or a mutex (respectively RefCell and Mutex).
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The Rust language wasn't built with the idea that code might need to continue when a memory allocation fails; instead, the normal result is an immediate crash.
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
2. In C++, std::bad_alloc is thrown, which is like the managed case except that stack unwinding will cause destructors to run (in the managed case, finalizers *might* run, but there is no guarantee of when they get called). If any of those destructors tries to allocate any memory, for any reason, it might cause a second std::bad_alloc to get thrown, and if a destructor throws an exception while another exception is already pending, the runtime gives up and calls std::terminate. Therefore, if you want to handle fallible allocations, every destructor in your entire program must be in on the joke.
3. In C, malloc returns a null pointer. If you forget to check for it, undefined behavior occurs (but in practice, you probably just segfault most of the time). If you remembered to check for it, you can handle it in whatever way you like, but in most cases you either return an error to your caller, or call longjmp(3) on a pre-allocated jmp_buf to go back to the main event loop or some other top-level scope that can reasonably figure out what to do next.
4. Regardless of language, it's possible for the OS to lie to you and give you memory which doesn't actually exist, then kill you when you try to use it. Linux actually does this,* and I believe it's common in other modern OSes as well. Where this functionality is enabled, any program can crash upon the system running out of memory, and there's nothing the programmer can reasonably do about it.
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust
The kernel radar: folios, multi-generational LRU, and Rust