Moving x86 to LMB
Some review: in a running kernel, memory management is handled by the buddy allocator (at the page level), with the slab allocator on top. These allocators are complex pieces of code which cannot run in the absence of a mostly functional kernel, so they cannot be used in the early stages of the bootstrap process. What is used, instead, is an architecture-specific chain of simple allocators. For x86, things start with a brk()-like mechanism which yields to the "e820" early reservation code, which, in turn, gives way to the bootmem allocator. Once the bootstrap has gotten far enough, the slab allocator can take over from the bootmem code. Yinghai's 2.6.34 changes were meant to short out the bootmem stage, allowing the system to use the early reservation code until slab can run.
During the review process for that code, some reviewers asked why x86 did not use the "logical memory block" (LMB) allocator instead of its own early reservation code. LMB is currently used by the Microblaze, PowerPC, SuperH, and SPARC architectures, so it has the look of a generic solution. There are obvious advantages to using generic code over architecture-specific variants; there are more eyes to look at the code and the overall maintenance cost is reduced. So the idea of moving to LMB made obvious sense.
LMB is, as might be expected, a truly simplistic memory manager. Low-level architecture code gives it blocks of memory to manage as it discovers them with:
long lmb_add(u64 base, u64 size);
The LMB allocator will duly store that region into a fixed-length array of known memory blocks, coalescing it with existing blocks if need be. Memory may then be allocated with:
u64 lmb_alloc(u64 size, u64 align);
Allocated blocks are tracked in a second array which looks just like the first; an allocation is satisfied by iterating through the available blocks, trying to find a sufficiently large chunk which is not already reserved by somebody else. There are other functions for reserving specific regions of memory, allocating memory on specific NUMA nodes, etc. But, at its core, LMB is a simple allocator which is meant to do a good-enough job until something more sophisticated can take over.
Yinghai's patch set makes a number of changes to the LMB code itself, starting with a move from the lib directory over to mm with the rest of the memory-management code. Some new functions are added to match the different semantics supported by the early reservation code, which works in a two-step, "find a memory block, then reserve it" mode. There is also a new function to transfer LMB reservations into the bootmem allocator for configurations where bootmem is still in use. The 22-part series culminates with a switch to LMB calls for early allocations and the removal of the now-unused early reservation code.
There has been surprisingly little discussion for a patch series which makes such fundamental changes. It seems that most kernel developers pay relatively little attention to what happens at the architecture-specific levels. One exception is Ben Herrenschmidt, who keeps an eye on LMB from the PowerPC perspective. Ben disagrees with a number of the LMB-level changes, feeling that they complicate the API and potentially introduce problems. Instead, it looks like Ben would like to fix up the LMB code himself, letting Yinghai work on the x86-specific side of things.
To that end, Ben has posted a patch series of his own, saying:
Some of the changes simply clean up the LMB code, adding, for example, a for_each_lmb() macro for iterating through the array of memory blocks. The fixed-length arrays are made variable, phys_addr_t is used to represent physical addresses, and the code is substantially reorganized. There is much that Ben still plans to do, including, happily, the addition of actual documentation to the API, but even without all that, it's a significant cleanup for the LMB code.
As with Yinghai's patches there has been little in the way of discussion.
It may be that
these changes will remain below the radar while the two patch sets are
integrated and - maybe - merged for 2.6.35. With luck, they'll remain
below the radar thereafter as well, with few people even noticing the
difference.
Index entries for this article | |
---|---|
Kernel | Logical memory block (LMB) |
Kernel | Memory management/During early boot |