The 2024 Linux Storage, Filesystem, Memory-Management, and BPF Summit
The 2024 event was
held May 13 to 15 in Salt Lake City, Utah. As usual, LWN was
there to report on the discussions that were held.
The Linux Storage, Filesystem, Memory-Management, and BPF Summit is an
annual, invitation-only event where about 140 developers gather to address
core-kernel problems. Below are the reports on the summit sessions that we attended:
Joint storage, filesystem, and memory-management sessions
- The state of the page: how the transition to folios is going and what remains to be done.
- The interaction between memory reclaim and RCU: the reclaim process can be accelerated by using the read-copy-update mechanism to avoid locking, but there are still some problems to work out.
- Large folios, swap, and FS-Cache: a discussion on swapping for folios larger than a single page and whether it makes sense to combine swapping with the FS-Cache network filesystem cache, since they have some overlapping needs.
- A new swap abstraction layer for the kernel: redesigning the swap layer for better performance, especially with large folios.
- Large-folio support for shmem and tmpfs: improving the kernel's shared-memory mechanisms with large folios.
- Famfs: a filesystem interface to shareable memory: discussion on whether the famfs filesystem for memory shared between hosts should be a kernel or a FUSE filesystem.
Joint storage and filesystem sessions
- Supporting larger block sizes in filesystems: another discussion of what needs to be done for filesystems in order to support block sizes larger than 4KB.
- Atomic writes without tears: a discussion on how to support buffered I/O writes of 16KB with protection against torn (partial) writes.
- Filesystems and iomap: conversions of various filesystems to use iomap are ongoing; what are the remaining problems that need to be solved?
- Measuring and improving buffered I/O: a "pathological" test result showed buffered I/O performance being far behind that of direct I/O; the underlying problems and possible solutions were discussed.
- Rust for filesystems: adding Rust abstractions for the VFS layer is proceeding, though there are still obstacles that need to be resolved, which was the topic of the discussion.
Filesystem track sessions
- New APIs for filesystems: a discussion on new APIs needed for filesystems, particularly newer filesystems that have subvolumes and snapshots.
- Handling the NFS change attribute: file timestamps do not have the granularity needed for NFS-client-cache-invalidation purposes; the session was yet another discussion on ways to fix that problem.
- Removing GFP_NOFS: the GFP_NOFS flag should be replaced by using the scoped-allocation API, but that conversion has not made all that much progress, what can be done to change that?
- Dropping the page cache for filesystems: a discussion of providing an API to drop the page cache for a specific filesystem; a full solution is really possible, but there are ways to get most of the way there.
- Finishing the conversion to the "new" mount API: many kernel filesystems have still not converted to use the mount API that came in Linux 5.2 in 2019; the discussion considered some of the remaining issues to be resolved to finish the job.
- Mount notifications: a discussion on adding a feature to allow user space to track mount and unmount activity.
- A new API for tree-in-dcache filesystems: filesystems that store their entire tree in the directory-entry cache have proliferated, without handling the edge cases well; a new API would try to clean up some of those problems.
- Improving pseudo filesystems: problems abound in pseudo (or virtual) filesystems, in part because there is a lack of guidance available for kernel developers who want to create one; what can be done to improve that?
- Hierarchical storage management, fanotify, FUSE, and more: a discussion on implementing HSM using fanotify or FUSE and some problems encountered, especially with regard to executing from files that are not local.
- Changing the filesystem-maintenance model : a discussion on ways to prevent filesystem bugs that should be caught earlier from reaching the mainline, by changing how filesystem testing is done.
- Filesystem testing for stable kernels: a discussion on the amount of testing that needs to be done for XFS patches heading toward the stable kernels.
- Handling filesystem interruptibility: filesystems expecting non-interruptibility can be surprised when code they call takes locks or mutexes interruptibly (or killably); what can be done to fix that?
- Tracing the source of filesystem errors: trying to find a way to provide more information on where a filesystem error is coming from; the same error code can be returned for many different error conditions, which makes debugging difficult.
Memory-management track sessions
The following sessions were held in the refrigerated room set aside for the memory-management developers:
- An update and future plans for DAMON: DAMON and DAMOS provide a toolkit for the control of memory-reclaim policies from user space. DAMON author SeongJae Park updated the group on recent developments in this subsystem and talked about where it is going next.
- Extending the mempolicy interface for heterogeneous systems: the kernel's memory-policy API has not kept pace with hardware changes; how can that be fixed?
- Better support for locally attached memory tiering: CXL memory holds out the promise of significant cost savings, but only if the kernel can manage it properly.
- What's next for the SLUB allocator: the current and future status of the kernel's one remaining object allocator.
- Facing down mapcount madness; managing the mapping count of pages is trickier than it seems, but the situation is being improved.
- Dynamically sizing the kernel stack: kernel stacks are simultaneously too small and too big; making their size variable would solve that problem.
- Memory-allocation profiling for the kernel: a once-contentious discussion on this new feature refocuses on future improvements.
- Another try for address-space isolation: mitigations for hardware vulnerabilities have cost us a lot of performance; address-space isolation offers protection against present and future vulnerabilities while giving us that performance back.
- Faster page faults with RCU-protected VMA walks: the faster way to search through the VMA tree.
- Toward the unification of hugetlbfs: the hugetlbfs subsystem is arguably an outmoded way of accessing huge pages that imposes costs on memory-management maintenance. Coalescing it into the core will help, but it will not be an easy job.
- Merging msharefs: this proposal to allow the sharing of page tables between processes has been under consideration for some time; what is needed to get it upstream?
- Documenting page flags by committee: an attempt at large-scale collaborative authoring.
- Two sessions on CXL memory: Compute Express Link is promoted as a boon to data-center computing; two sessions looked at how the kernel can support this functionality.
- The path to deprecating SPARSEMEM: the kernel has several ways of representing physical memory; one of them may be on its way out.
- The twilight of the version-1 memory controller: the version-1 control-group API was superseded years ago, but users of the old memory-controller interface still exist. How can they be convinced to move on so that this old code can be removed?
- Allocator optimizations for transparent huge pages: proposed memory-management changes to improve the chances of successfully allocating huge pages.
- Two talks on multi-size transparent huge page performance: multi-size THPs are seen as a performance benefit, but how much does the system really gain from them?
- The next steps for the maple tree: upcoming features planned for this relatively new kernel data structure.
- Fleshing out memory descriptors: a first view into what the memory-descriptor future might look like.
- The state of the memory-management community in 2024: the traditional session with Andrew Morton to discuss how memory-management development is going.
- Measuring memory fragmentation: an attempt to find a way to measure how badly memory has been fragmented.
BPF track sessions
- A plan to make BPF kfuncs polymorphic: a proposal that would allow kfuncs to use different implementations depending on where and how they are called.
- Virtual machine scheduling with BPF: a talk about solving the "double scheduling" problem for virtual machines.
- What's scheduled for sched_ext: sched_ext has come a long way in the past year. What's changed, and what is still needed for the work to be meaningfully complete?
- Recent improvements to BPF's struct_ops mechanism: BPF continues to evolve support for more generic kernel interfaces.
- LLVM improvements for BPF verification: what can compiler developers do to ensure their compilers produce verifiable code?
- Supporting BPF in GCC: GCC can now compile a lot of BPF code. What did it take, and where is the project going next?
- Standardizing the BPF ISA: The IETF BPF working group is nearly done standardizing a BPF ISA specification.
- An instruction-level BPF memory model: BPF doesn't have a memory model yet; what are important properties for whichever it adopts?
- Comparing BPF performance between implementations: there is a benchmark suite which runs on both Windows and Linux that can be used to make comparisons.
- Modernizing BPF for the next ten years: what is needed for BPF to continue to grow?
- Securing BPF programs before and after verification: BPF is in a unique position in terms of kernel security; what more can be done to ensure it remains secure?
- Simplifying the BPF verifier: the BPF verifier is a complex piece of software; Shung-Hsi Yu has a proposal for making it more efficient, capable, and simpler.
- Static keys for BPF: the kernel uses static keys to provide an efficient way to dynamically enable seldom-used code paths; can BPF do the same?
- BPF tracing performance: two changes that make using BPF for tracing more performant.
- Capturing stack traces asynchronously with BPF: a proposed change to the stack-capture API could make it much more useful.
- Updates to pahole: Poke-a-hole has grown far beyond its original parameters; now, it is being used to produce BTF debugging information for the kernel.
Update: our entire coverage from LSFMM+BPF 2024 is now available as an ebook in the EPUB format.
Group photo
Support
Once again, we thank the Linux Foundation, LWN's travel sponsor, for
supporting our travel to this event.
Index entries for this article | |
---|---|
Conference | Storage, Filesystem, Memory-Management and BPF Summit/2024 |