Implementing alignment guarantees for kmalloc()

October 18, 2019

This article was contributed by Marta Rybczyńska

kmalloc() is a frequently used primitive for the allocation of small objects in the kernel. During the 2019 Linux Storage, Filesystem, and Memory Management Summit, Vlastimil Babka led a session about the unexpected alignment problems developers face when using this function. After a few months he has come back with the second version of a patch set implementing a natural alignment guarantee for kmalloc(). From the strong opposition it faced initially, it seemed that the change would not get accepted. However, it ended up in Linus Torvalds's tree. Let's explore what happened.

The issue Babka wanted to fix is the fact that kmalloc() sometimes returns objects that are not naturally aligned (that is, aligned to the object size if that size is a power of two). Most of the time, though, kmalloc() does return naturally aligned objects and some drivers and subsystems have come to depend on that property. The exceptions are when SLUB debugging is enabled or when the SLOB allocator is used. kmalloc() is essentially a shell around the SLAB, SLUB or SLOB allocator, depending on the kernel configuration; interested readers may wish to read an article on the reasons SLUB was introduced and look at a LinuxCon 2014 slide set [PDF] on the three allocators. Unexpectedly returning an unaligned object can cause data corruption and other errors. In response to that problem, Babka proposed to guarantee natural alignment for allocated objects with power-of-two size, so that all alignment expectations are fulfilled.

For and against `kmalloc()` alignment

In the patch set discussion, Christopher Lameter (the creator of the SLUB allocator) disagreed with the idea of adding natural alignment and noted that kmalloc() has its own alignment limit (KMALLOC_MINALIGN) for a reason: to allow optimized memory layout without wasting memory. The SLOB allocator is an example; it is designed for small embedded systems and to incur minimal overhead. The patch from Babka would change that expected behavior. Also, any future allocators would have to take those new constraints into account and that would prevent them from implementing certain optimizations in their memory layout.

Matthew Wilcox was in favor of Babka's proposal, as there are many subsystems that already depend on the implied alignment behavior. He mentioned examples like the persistent-memory (pmem) and RAM-disk drivers. The XFS filesystem, without an alignment guarantee, would need slab caches for each object size between 512 bytes and PAGE_SIZE, and it may need even more of them depending on what kmalloc() does guarantee.

Dave Chinner agreed with providing alignment for small objects and spoke for further alignment of large objects (bigger than a page) to page boundaries. This need was seen when using pmem with KASAN. He suggested, though, using a GFP flag to tell the allocator to return a naturally aligned object, and to fail if it cannot. That would avoid the need for higher-level subsystems to create additional caches. Babka and other developers preferred to deal with the issue without a separate flag.

A heated debate followed about the severity of the issue. Lameter disagreed that the misalignment cases are frequent, or even seen in practice, as the drivers affected are enabled in distribution test systems that use debug options. The cases of bad alignment should have been seen in that testing, according to him. Christoph Hellwig noted that the breakage often happens under special conditions, like buffers that cross a page boundary.

From a private NAK to the mainline

Following the debate, Babka asked for formal approval or disapproval of the patch set:

So if anyone thinks this is a good idea, please express it (preferably in a formal way such as Acked-by), otherwise it seems the patch will be dropped (due to a private NACK, apparently).

David Sterba commented that he has had to apply workarounds for misalignment cases and would be happy to remove them when the generic code is fixed. Darrick J. Wong seconded Sterba's opinion and expressed his strong preference for open discussion:

Oh, I didn't realize ^^^^^^^^^^^^ that *some* of us are allowed the privilege of gutting a patch via private NAK without any of that open development discussion inconvenience. <grumble>

Lameter followed up stating that the options to detect misalignment have been available for years and are ready to use. Wilcox disagreed, as the issues show up when debugging options are enabled and this is particularly the case when all of the other features should work fine:

People who are enabling a debugging option to debug their issues, should not have to first debug all the other issues that enabling that debugging option uncovers!

Andrew Morton moved the discussion back to the technical subject and asked for verification of the patch's correctness. Lameter confirmed that it is technically fine, while still disagreeing with the intent. That was followed by a number of acknowledgments (Acked-by:) from kernel developers showing their support for Babka's solution.

That series of approvals ended the public discussion; Babka did not resend the patch set or submit a third version. The situation seemed blocked as the patch set had support of multiple developers, but not from the maintainer of the SLUB allocator, which is heavily affected by the patch set. However, the patch was included in Morton's tree and was merged to the mainline on October 7th.

Summary

This discussion shows an example of the kernel community working on a change that affects a behavior that has been present for a long time. It is not a surprise that not all developers agreed with the solution — however, in this case, the one disagreeing was the maintainer of one of the modified subsystems. The final result shows that such changes can be accepted into the mainline since there was wide support from kmalloc() users and other memory-management developers.

Index entries for this article
Kernel	Memory management/Internal API
GuestArticles	Rybczynska, Marta

Implementing alignment guarantees for kmalloc()

Posted Oct 20, 2019 18:39 UTC (Sun) by wilevers (subscriber, #110407) [Link] (6 responses)

I'm confused. Why can't we have a separate API for callers that have a specific alignment requirement? That would allow any allocator to optimally supply what is expected instead of having to guess the caller's intent.

Implementing alignment guarantees for kmalloc()

Posted Oct 20, 2019 19:28 UTC (Sun) by epa (subscriber, #39769) [Link] (5 responses)

Because the kernel is full of callers whose intent is to get aligned access, but don't explicitly specify it. Why do I say their intent is to get aligned access? Simply because of the fact that access always has been aligned up to now, in all but a few oddball configurations. If it almost always behaves that way in practice, any support in callers for unaligned access won't get exercised and will inevitably rot away. By the same argument you can say that malloc() (in user space / libc) is effectively an API that never returns null on failure, since it never does so in practice and nobody writes application code that handles it correctly; even with the best of intentions such code wouldn't get exercised enough for it to have a chance of working correctly.

I think the question is more whether there are any callers that have a specific requirement for unaligned allocations, that is, cases where there are so many tiny allocations that the wasted space matters. And if they exist, whether they wouldn't be better served with large requests to kmalloc() feeding to their own internal allocator which portions them out into tiny slices.

(This from the perspective of a non-kernel-developer, but the same issue arises in any API, having only one real implementation, where a particular behaviour is theoretically possible but doesn't arise in practice. In my view the answer is almost always to tighten up the specification, codifying the implicit guarantees that have held up to now.)

Implementing alignment guarantees for kmalloc()

Posted Oct 20, 2019 23:36 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> I think the question is more whether there are any callers that have a specific requirement for unaligned allocations, that is, cases where there are so many tiny allocations that the wasted space matters.
Typically such unspoken ABIs are extended explicitly, by adding new flags.

So by default kmalloc() should return an aligned block, but there should be a flag explicitly requesting unaligned block. This way there won't be a need to have multiple allocators in each subsystem needing this.

Implementing alignment guarantees for kmalloc()

Posted Oct 21, 2019 7:56 UTC (Mon) by vbabka (subscriber, #91706) [Link] (3 responses)

> I think the question is more whether there are any callers that have a specific requirement for unaligned allocations, that is, cases where there are so many tiny allocations that the wasted space matters. And if they exist, whether they wouldn't be better served with large requests to kmalloc() feeding to their own internal allocator which portions them out into tiny slices.

The new alignment guarantees are only for power of two sizes, and that's when the common SLAB and SLUB configurations already don't waste any memory. For other sizes, kmalloc() will mostly round them up to the nearest power-of-two anyway (exceptions are 96 and 192 bytes, see your /proc/slabinfo), so the waste comes from that. Those who allocate significant number of "oddly sized" objects should create own cache for them by kmem_cache_create() with precise size and optional alignment, which will minimize the waste.

Implementing alignment guarantees for kmalloc()

Posted Oct 21, 2019 16:48 UTC (Mon) by jreiser (subscriber, #11027) [Link] (2 responses)

The new alignment guarantees are only for power of two sizes... The result should be aligned to min(PAGE_SIZE, n & ~(-1+n)), which is the place value of the lowest-order '1' bit in the requested size (but limited to the sizeof one page). So if the request is for 40 bytes then the result should be 8-byte aligned. The rationale is: the alignment of a struct having that size.

Implementing alignment guarantees for kmalloc()

Posted Oct 22, 2019 19:55 UTC (Tue) by wilevers (subscriber, #110407) [Link] (1 responses)

Try parsing this. Then again. Any questions?

Implementing alignment guarantees for kmalloc()

Posted Oct 23, 2019 14:20 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]

The expression was n & ~(-1 + n). That's a slightly steganographically obscured version of n & ~(n - 1). Assuming that n is some non-zero binary number, subtracting one from that causes the lowest set 1 bit to change to 0 and all 0 bits below it to 1. All other 1 bits of n are also 1 in n - 1. Hence n & ~(n - 1) leaves only the lowest bit of n set.

Implementing alignment guarantees for kmalloc()

For and against kmalloc() alignment

From a private NAK to the mainline

Summary

Implementing alignment guarantees for kmalloc()

Implementing alignment guarantees for kmalloc()

Implementing alignment guarantees for kmalloc()

Implementing alignment guarantees for kmalloc()

Implementing alignment guarantees for kmalloc()

Implementing alignment guarantees for kmalloc()

Implementing alignment guarantees for kmalloc()

For and against `kmalloc()` alignment