-
Notifications
You must be signed in to change notification settings - Fork 747
kernel/process_standard: ensure kernel_memory_break align & cleanup #3556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel/process_standard: ensure kernel_memory_break align & cleanup #3556
Conversation
This commit addresses issue tock#1739 by ensuring that the kernel memory break is always well-aligned to a word-boundary, regardless of the MPU's internal alignment constraints. It further cleans up the memory allocation code. Tracing the previous slices and pointers through the previous code was rather difficult. In an effort to simply this logic, the returned memory is now continuously sub-sliced until all individual memory sub-sections have been isolated into their own slices. This avoids indexing into larger slices, encompassing multiple logically-distinct memory allocations. It further renames the allocation returned by `allocate_app_memory_region` to no longer have "app" in its name, as this can be mistaken for app-accessible memory. Instead, it simply calls it `allocated_memory`, which is then sub-sliced into `app_accessible_memory` and `allocated_kernel_memory` slices. Visually, the memory is now sliced as follows: +----------------------------------------------------------------- | remaining_memory +----------------------------------------------------+------------ v v +----------------------------------------------------+ | allocated_padded_memory | +--+-------------------------------------------------+ v v +-------------------------------------------------+ | allocated_memory | +-------------------------------------------------+ v v +-----------------------+-------------------------+ | app_accessible_memory | allocated_kernel_memory | +-----------------------+-------------------+-----+ v kernel memory break \---+/ v optional padding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you should copy your diagram in to the code.
I assume you can run the process command on the pconsole with a null app before and after to ensure all this doesn't change anything?
let initial_kernel_memory_size = grant_ptrs_offset | ||
+ Self::CALLBACKS_OFFSET | ||
+ Self::PROCESS_STRUCT_OFFSET | ||
+ core::mem::size_of::<usize>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just calculate the exact needed padding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know this ahead of time, it's up to the MPU where the end of the allocation is placed. We have three options:
- Add an additional "allocation end" alignment constraint to
allocate_app_memory_region
, where we can then specify that we want the end of the region word-size aligned. This seems sub-optimal, as this is not even a kernel-invariant, but rather one ofProcessStandard
. - Add
core::mem::size_of::<usize>() - 1
to the initial kernel memory size, which is the maximum amount of padding we'll ever need. However, with our current MPU implementations, this will mean that we'll always end on an unaligned address, and always waste those 3 bytes. - Add
core::mem::size_of::<usize>()
. This is one byte more than required, but it would mean that an MPU implementation which naturally aligns regions to a word boundary places this padding also on a word boundary. We would have a slightly largerinitial_kernel_memory_size
than required, but wouldn't waste any of this memory, for instance when we do grant allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have precedent for the first option with grant
: we request both the size and the alignment when creating a grant. I think it's reasonable to say that alignment is a fundamental operation for allocation.
But I'm not sure I feel strongly about changing the interface for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for adding alignment to the interface, but what feels weird is to add an alignment requirement towards the end of a region. Usually the MPU performs alignment inserting padding in the front. In this case, we'd need the MPU to place the kernel region towards the end of the allocated section, then move it further towards the front to achieve alignment, while not colliding with the app's memory, and then passing back the allocated start address (which isn't even subject to that alignment) to the caller.
Apparently, even our current interface has sufficient complexity to implement it incorrectly (#3544), and adding all of the above constraints adds so many more edge-cases.
I don't like either, but for now I'd be leaning towards allocating another word of memory, which can be used by grants at runtime anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yes I agree. Given that we are allocating a larger region, it makes sense that the process standard impl just ensures it can both give the process enough accessible memory and make its grant bookkeeping aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better than what we have currently, but doing a more thorough read through of the whole process creation I think TheRightWay(TM) would move more intelligence and work into the userspace_kernel_boundary
.
Ultimately, alignment requirements derive from architecture specifications. To that end, all the little final padding, shifting, casting bits really should be in arch-specific implementations. As one example, initial_app_brk
ultimately gets passed to userspace_kernel_boundary.app_brk
where it's used as the initial process stack pointer. On ARM, this requires 8-byte alignment. Nothing enforces this*. On the flip side, technically ARM Cortex-M3—M7 actually allow for unaligned access (with degraded runtime performance as the HW silently does the alignment), but if some more practical architecture had looser requirements, we would bake padding we don't need into processes.
In the short term, I think we should merge this, including the current "extra word of padding for alignment if-needed" solution. In the longer run, I'd like to play with a slightly more compartmentalized and federated architecture for process creation—which, at minimum, might break the current 500-line monolith that is create()
into more manageable, conceptual pieces.
I'm also skeptical that a "default implementation" for the MPU should exist, especially if the MPU is to be responsible for things like defining region sizes. Indeed, the real root cause IMO to the error reported on the original issue here is that the MPU region allocation did not follow rules for the currently executing architecture (which a default impl can't possibly do!). Maybe I'll see if it makes sense for that to go away as well.
*though it does happen to work out in our common case paths right now. It derives from min_process_memory_size
, which comes UKB.initial_process_app_brk_size
which is SVC_FRAME_SIZE
which is 32 bytes. This is then added app_mpu_mem_start
, which ultimately chains through a few MPU things to something aligned. Frankly, it's unlikely this wouldn't work out on real ARM implementations, but that isn't a case for hidden, cross-layer dependencies.
Also, possibly of interest, removing the clippy allows doesn't generate clippy warnings anymore (on |
I'm skeptical, but, building on the idea that we need |
Yeah, I'm not certain that UKB is the right way to do it either :/. I think we have a bunch of places where we assume alignment that in practice work out on all 'normal' systems, but host-based unit testing and folks exploring 64-bit arch's, etc are starting to expose / violate these assumptions. I opened #3583 to flag/track the more substantial alignment issue. I think we should merge this as-is as it fixes a real problem in the now and has some nice buffer management cleanups. I don't think this will be the final word here, but think the 'most right' fix is a much bigger undertaking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on master:
tock$ process bare_bones
𝐀𝐩𝐩: bare_bones - [Yielded]
Events Queued: 0 Syscall Count: 1 Dropped Upcall Count: 0
Restart Count: 0
Last Syscall: Yield { which: 1, address: 0x20007000 }
Completion Code: None
Credential: None
╔═══════════╤══════════════════════════════════════════╗
║ Address │ Region Name Used | Allocated (bytes) ║
╚0x20007800═╪══════════════════════════════════════════╝
│ Grant Ptrs 120
│ Upcalls 320
│ Process 920
0x200072B0 ┼───────────────────────────────────────────
│ ▼ Grant 0
0x200072B0 ┼───────────────────────────────────────────
│ Unused
0x20007020 ┼───────────────────────────────────────────
│ ▲ Heap ? | ? S
?????????? ┼─────────────────────────────────────────── R
│ Data ? | ? A
?????????? ┼─────────────────────────────────────────── M
│ ▼ Stack ? | ?
0x20007000 ┼───────────────────────────────────────────
│ Unused
0x20007000 ┴───────────────────────────────────────────
.....
0x00041000 ┬─────────────────────────────────────────── F
│ App Flash 4016 L
0x00040050 ┼─────────────────────────────────────────── A
│ Protected 80 S
0x00040000 ┴─────────────────────────────────────────── H
with this pr:
tock$ process bare_bones
𝐀𝐩𝐩: bare_bones - [Yielded]
Events Queued: 0 Syscall Count: 1 Dropped Upcall Count: 0
Restart Count: 0
Last Syscall: Yield { which: 1, address: 0x20007000 }
Completion Code: None
Credential: None
╔═══════════╤══════════════════════════════════════════╗
║ Address │ Region Name Used | Allocated (bytes) ║
╚0x20007800═╪══════════════════════════════════════════╝
│ Grant Ptrs 120
│ Upcalls 320
│ Process 920
0x200072B0 ┼───────────────────────────────────────────
│ ▼ Grant 0
0x200072B0 ┼───────────────────────────────────────────
│ Unused
0x20007020 ┼───────────────────────────────────────────
│ ▲ Heap ? | ? S
?????????? ┼─────────────────────────────────────────── R
│ Data ? | ? A
?????????? ┼─────────────────────────────────────────── M
│ ▼ Stack ? | ?
0x20007000 ┼───────────────────────────────────────────
│ Unused
0x20007000 ┴───────────────────────────────────────────
.....
0x00041000 ┬─────────────────────────────────────────── F
│ App Flash 4016 L
0x00040050 ┼─────────────────────────────────────────── A
│ Protected 80 S
0x00040000 ┴─────────────────────────────────────────── H
tock$
Looks good to me!
Pull Request Overview
This commit addresses issue #1739 by ensuring that the kernel memory break is always well-aligned to a word-boundary, regardless of the MPU's internal alignment constraints.
It further cleans up the memory allocation code. Tracing the previous slices and pointers through the previous code was rather difficult. In an effort to simply this logic, the returned memory is now continuously sub-sliced until all individual memory sub-sections have been isolated into their own slices. This avoids indexing into larger slices, encompassing multiple logically-distinct memory allocations.
It further renames the allocation returned by
allocate_app_memory_region
to no longer have "app" in its name, as this can be mistaken for app-accessible memory. Instead, it simply calls itallocated_memory
, which is then sub-sliced intoapp_accessible_memory
andallocated_kernel_memory
slices.Visually, the memory is now sliced as follows:
Testing Strategy
Needs testing. How to best test this? Deliberately have the dummy MPU (
()
) return a non word-aligned kernel break?TODO or Help Wanted
N/A
Documentation Updated
Updated the relevant files inor no updates are required./docs
,Formatting
make prepush
.