Implementing alignment guarantees for kmalloc()
kmalloc() is a frequently used primitive for the allocation of small objects in the kernel. During the 2019 Linux Storage, Filesystem, and Memory Management Summit, Vlastimil Babka led a session about the unexpected alignment problems developers face when using this function. After a few months he has come back with the second version of a patch set implementing a natural alignment guarantee for kmalloc(). From the strong opposition it faced initially, it seemed that the change would not get accepted. However, it ended up in Linus Torvalds's tree. Let's explore what happened.
The issue Babka wanted to fix is the fact that kmalloc() sometimes returns objects that are not naturally aligned (that is, aligned to the object size if that size is a power of two). Most of the time, though, kmalloc() does return naturally aligned objects and some drivers and subsystems have come to depend on that property. The exceptions are when SLUB debugging is enabled or when the SLOB allocator is used. kmalloc() is essentially a shell around the SLAB, SLUB or SLOB allocator, depending on the kernel configuration; interested readers may wish to read an article on the reasons SLUB was introduced and look at a LinuxCon 2014 slide set [PDF] on the three allocators. Unexpectedly returning an unaligned object can cause data corruption and other errors. In response to that problem, Babka proposed to guarantee natural alignment for allocated objects with power-of-two size, so that all alignment expectations are fulfilled.
For and against kmalloc() alignment
In the patch set discussion, Christopher Lameter (the creator of the SLUB allocator) disagreed with the idea of adding natural alignment and noted that kmalloc() has its own alignment limit (KMALLOC_MINALIGN) for a reason: to allow optimized memory layout without wasting memory. The SLOB allocator is an example; it is designed for small embedded systems and to incur minimal overhead. The patch from Babka would change that expected behavior. Also, any future allocators would have to take those new constraints into account and that would prevent them from implementing certain optimizations in their memory layout.
Matthew Wilcox was in favor of Babka's proposal, as there are many subsystems that already depend on the implied alignment behavior. He mentioned examples like the persistent-memory (pmem) and RAM-disk drivers. The XFS filesystem, without an alignment guarantee, would need slab caches for each object size between 512 bytes and PAGE_SIZE, and it may need even more of them depending on what kmalloc() does guarantee.
Dave Chinner agreed with providing alignment for small objects and spoke for further alignment of large objects (bigger than a page) to page boundaries. This need was seen when using pmem with KASAN. He suggested, though, using a GFP flag to tell the allocator to return a naturally aligned object, and to fail if it cannot. That would avoid the need for higher-level subsystems to create additional caches. Babka and other developers preferred to deal with the issue without a separate flag.
A heated debate followed about the severity of the issue. Lameter disagreed that the misalignment cases are frequent, or even seen in practice, as the drivers affected are enabled in distribution test systems that use debug options. The cases of bad alignment should have been seen in that testing, according to him. Christoph Hellwig noted that the breakage often happens under special conditions, like buffers that cross a page boundary.
From a private NAK to the mainline
Following the debate, Babka asked for formal approval or disapproval of the patch set:
David Sterba commented that he has had to apply workarounds for misalignment cases and would be happy to remove them when the generic code is fixed. Darrick J. Wong seconded Sterba's opinion and expressed his strong preference for open discussion:
Lameter followed up stating that the options to detect misalignment have been available for years and are ready to use. Wilcox disagreed, as the issues show up when debugging options are enabled and this is particularly the case when all of the other features should work fine:
Andrew Morton moved the discussion back to the technical subject and asked for verification of the patch's correctness. Lameter confirmed that it is technically fine, while still disagreeing with the intent. That was followed by a number of acknowledgments (Acked-by:) from kernel developers showing their support for Babka's solution.
That series of approvals ended the public discussion; Babka did not resend the patch set or submit a third version. The situation seemed blocked as the patch set had support of multiple developers, but not from the maintainer of the SLUB allocator, which is heavily affected by the patch set. However, the patch was included in Morton's tree and was merged to the mainline on October 7th.
Summary
This discussion shows an example of the kernel community working on a change that affects a behavior that has been present for a long time. It is not a surprise that not all developers agreed with the solution — however, in this case, the one disagreeing was the maintainer of one of the modified subsystems. The final result shows that such changes can be accepted into the mainline since there was wide support from kmalloc() users and other memory-management developers.
Index entries for this article | |
---|---|
Kernel | Memory management/Internal API |
GuestArticles | Rybczynska, Marta |
Posted Oct 20, 2019 18:39 UTC (Sun)
by wilevers (subscriber, #110407)
[Link] (6 responses)
Posted Oct 20, 2019 19:28 UTC (Sun)
by epa (subscriber, #39769)
[Link] (5 responses)
I think the question is more whether there are any callers that have a specific requirement for unaligned allocations, that is, cases where there are so many tiny allocations that the wasted space matters. And if they exist, whether they wouldn't be better served with large requests to kmalloc() feeding to their own internal allocator which portions them out into tiny slices.
(This from the perspective of a non-kernel-developer, but the same issue arises in any API, having only one real implementation, where a particular behaviour is theoretically possible but doesn't arise in practice. In my view the answer is almost always to tighten up the specification, codifying the implicit guarantees that have held up to now.)
Posted Oct 20, 2019 23:36 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
So by default kmalloc() should return an aligned block, but there should be a flag explicitly requesting unaligned block. This way there won't be a need to have multiple allocators in each subsystem needing this.
Posted Oct 21, 2019 7:56 UTC (Mon)
by vbabka (subscriber, #91706)
[Link] (3 responses)
The new alignment guarantees are only for power of two sizes, and that's when the common SLAB and SLUB configurations already don't waste any memory. For other sizes, kmalloc() will mostly round them up to the nearest power-of-two anyway (exceptions are 96 and 192 bytes, see your /proc/slabinfo), so the waste comes from that. Those who allocate significant number of "oddly sized" objects should create own cache for them by kmem_cache_create() with precise size and optional alignment, which will minimize the waste.
Posted Oct 21, 2019 16:48 UTC (Mon)
by jreiser (subscriber, #11027)
[Link] (2 responses)
Posted Oct 22, 2019 19:55 UTC (Tue)
by wilevers (subscriber, #110407)
[Link] (1 responses)
Posted Oct 23, 2019 14:20 UTC (Wed)
by rweikusat2 (subscriber, #117920)
[Link]
Implementing alignment guarantees for kmalloc()
Implementing alignment guarantees for kmalloc()
Implementing alignment guarantees for kmalloc()
Typically such unspoken ABIs are extended explicitly, by adding new flags.
Implementing alignment guarantees for kmalloc()
The new alignment guarantees are only for power of two sizes... The result should be aligned to min(PAGE_SIZE, n & ~(-1+n)), which is the place value of the lowest-order '1' bit in the requested size (but limited to the sizeof one page). So if the request is for 40 bytes then the result should be 8-byte aligned. The rationale is: the alignment of a struct having that size.
Implementing alignment guarantees for kmalloc()
Implementing alignment guarantees for kmalloc()
Implementing alignment guarantees for kmalloc()