8000 kernel/process_standard: ensure kernel_memory_break align & cleanup by lschuermann · Pull Request #3556 · tock/tock · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

kernel/process_standard: ensure kernel_memory_break align & cleanup #3556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 1, 2023

Conversation

lschuermann
Copy link
Member
@lschuermann lschuermann commented Jul 21, 2023

Pull Request Overview

This commit addresses issue #1739 by ensuring that the kernel memory break is always well-aligned to a word-boundary, regardless of the MPU's internal alignment constraints.

It further cleans up the memory allocation code. Tracing the previous slices and pointers through the previous code was rather difficult. In an effort to simply this logic, the returned memory is now continuously sub-sliced until all individual memory sub-sections have been isolated into their own slices. This avoids indexing into larger slices, encompassing multiple logically-distinct memory allocations.

It further renames the allocation returned by allocate_app_memory_region to no longer have "app" in its name, as this can be mistaken for app-accessible memory. Instead, it simply calls it allocated_memory, which is then sub-sliced into app_accessible_memory and allocated_kernel_memory slices.

Visually, the memory is now sliced as follows:

+-----------------------------------------------------------------
| remaining_memory
+----------------------------------------------------+------------
v                                                    v
+----------------------------------------------------+
| allocated_padded_memory                            |
+--+--
8000
-----------------------------------------------+
   v                                                 v
   +-------------------------------------------------+
   | allocated_memory                                |
   +-------------------------------------------------+
   v                                                 v
   +-----------------------+-------------------------+
   | app_accessible_memory | allocated_kernel_memory |
   +-----------------------+-------------------+-----+
                                               v
                             kernel memory break
                                                \---+/
                                                    v
                                      optional padding

Testing Strategy

Needs testing. How to best test this? Deliberately have the dummy MPU (()) return a non word-aligned kernel break?

TODO or Help Wanted

N/A

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

This commit addresses issue tock#1739 by ensuring that the kernel
memory break is always well-aligned to a word-boundary, regardless of
the MPU's internal alignment constraints.

It further cleans up the memory allocation code. Tracing the previous
slices and pointers through the previous code was rather difficult. In
an effort to simply this logic, the returned memory is now
continuously sub-sliced until all individual memory sub-sections have
been isolated into their own slices. This avoids indexing into larger
slices, encompassing multiple logically-distinct memory allocations.

It further renames the allocation returned by
`allocate_app_memory_region` to no longer have "app" in its name, as
this can be mistaken for app-accessible memory. Instead, it simply
calls it `allocated_memory`, which is then sub-sliced into
`app_accessible_memory` and `allocated_kernel_memory` slices.

Visually, the memory is now sliced as follows:

    +-----------------------------------------------------------------
    | remaining_memory
    +----------------------------------------------------+------------
    v                                                    v
    +----------------------------------------------------+
    | allocated_padded_memory                            |
    +--+-------------------------------------------------+
       v                                                 v
       +-------------------------------------------------+
       | allocated_memory                                |
       +-------------------------------------------------+
       v                                                 v
       +-----------------------+-------------------------+
       | app_accessible_memory | allocated_kernel_memory |
       +-----------------------+-------------------+-----+
                                                   v
                                 kernel memory break
                                                    \---+/
                                                        v
                                          optional padding
@lschuermann lschuermann requested a review from ppannuto July 21, 2023 15:44
Copy link
Contributor
@bradjc bradjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you should copy your diagram in to the code.

I assume you can run the process command on the pconsole with a null app before and after to ensure all this doesn't change anything?

let initial_kernel_memory_size = grant_ptrs_offset
+ Self::CALLBACKS_OFFSET
+ Self::PROCESS_STRUCT_OFFSET
+ core::mem::size_of::<usize>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just calculate the exact needed padding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know this ahead of time, it's up to the MPU where the end of the allocation is placed. We have three options:

  • Add an additional "allocation end" alignment constraint to allocate_app_memory_region, where we can then specify that we want the end of the region word-size aligned. This seems sub-optimal, as this is not even a kernel-invariant, but rather one of ProcessStandard.
  • Add core::mem::size_of::<usize>() - 1 to the initial kernel memory size, which is the maximum amount of padding we'll ever need. However, with our current MPU implementations, this will mean that we'll always end on an unaligned address, and always waste those 3 bytes.
  • Add core::mem::size_of::<usize>(). This is one byte more than required, but it would mean that an MPU implementation which naturally aligns regions to a word boundary places this padding also on a word boundary. We would have a slightly larger initial_kernel_memory_size than required, but wouldn't waste any of this memory, for instance when we do grant allocations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have precedent for the first option with grant: we request both the size and the alignment when creating a grant. I think it's reasonable to say that alignment is a fundamental operation for allocation.

But I'm not sure I feel strongly about changing the interface for this purpose.

Copy link
Member Author
@lschuermann lschuermann Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for adding alignment to the interface, but what feels weird is to add an alignment requirement towards the end of a region. Usually the MPU performs alignment inserting padding in the front. In this case, we'd need the MPU to place the kernel region towards the end of the allocated section, then move it further towards the front to achieve alignment, while not colliding with the app's memory, and then passing back the allocated start address (which isn't even subject to that alignment) to the caller.

Apparently, even our current interface has sufficient complexity to implement it incorrectly (#3544), and adding all of the above constraints adds so many more edge-cases.

I don't like either, but for now I'd be leaning towards allocating another word of memory, which can be used by grants at runtime anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yes I agree. Given that we are allocating a larger region, it makes sense that the process standard impl just ensures it can both give the process enough accessible memory and make its grant bookkeeping aligned.

ppannuto
ppannuto previously approved these changes Jul 26, 2023
Copy link
Member
@ppannuto ppannuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better than what we have currently, but doing a more thorough read through of the whole process creation I think TheRightWay(TM) would move more intelligence and work into the userspace_kernel_boundary.

Ultimately, alignment requirements derive from architecture specifications. To that end, all the little final padding, shifting, casting bits really should be in arch-specific implementations. As one example, initial_app_brk ultimately gets passed to userspace_kernel_boundary.app_brk where it's used as the initial process stack pointer. On ARM, this requires 8-byte alignment. Nothing enforces this*. On the flip side, technically ARM Cortex-M3—M7 actually allow for unaligned access (with degraded runtime performance as the HW silently does the alignment), but if some more practical architecture had looser requirements, we would bake padding we don't need into processes.

In the short term, I think we should merge this, including the current "extra word of padding for alignment if-needed" solution. In the longer run, I'd like to play with a slightly more compartmentalized and federated architecture for process creation—which, at minimum, might break the current 500-line monolith that is create() into more manageable, conceptual pieces.

I'm also skeptical that a "default implementation" for the MPU should exist, especially if the MPU is to be responsible for things like defining region sizes. Indeed, the real root cause IMO to the error reported on the original issue here is that the MPU region allocation did not follow rules for the currently executing architecture (which a default impl can't possibly do!). Maybe I'll see if it makes sense for that to go away as well.

*though it does happen to work out in our common case paths right now. It derives from min_process_memory_size, which comes UKB.initial_process_app_brk_size which is SVC_FRAME_SIZE which is 32 bytes. This is then added app_mpu_mem_start, which ultimately chains through a few MPU things to something aligned. Frankly, it's unlikely this wouldn't work out on real ARM implementations, but that isn't a case for hidden, cross-layer dependencies.

@ppannuto
Copy link
Member

Also, possibly of interest, removing the clippy allows doesn't generate clippy warnings anymore (on master, or this branch).

@bradjc
Copy link
Contributor
bradjc commented Jul 26, 2023

would move more intelligence and work into the userspace_kernel_boundary.

I'm skeptical, but, building on the idea that we need UKB.initial_process_app_brk_size might make this better.

@ppannuto
Copy link
Member

Yeah, I'm not certain that UKB is the right way to do it either :/. I think we have a bunch of places where we assume alignment that in practice work out on all 'normal' systems, but host-based unit testing and folks exploring 64-bit arch's, etc are starting to expose / violate these assumptions.

I opened #3583 to flag/track the more substantial alignment issue.

I think we should merge this as-is as it fixes a real problem in the now and has some nice buffer management cleanups. I don't think this will be the final word here, but think the 'most right' fix is a much bigger undertaking.

bradjc
bradjc previously approved these changes Jul 31, 2023
Copy link
Contributor
@bradjc bradjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on master:

tock$ process bare_bones
𝐀𝐩𝐩: bare_bones   -   [Yielded]
 Events Queued: 0   Syscall Count: 1   Dropped Upcall Count: 0
 Restart Count: 0
 Last Syscall: Yield { which: 1, address: 0x20007000 }
 Completion Code: None
 Credential: None


 ╔═══════════╤══════════════════════════════════════════╗
 ║  Address  │ Region Name    Used | Allocated (bytes)  ║
 ╚0x20007800═╪══════════════════════════════════════════╝
             │ Grant Ptrs      120
             │ Upcalls         320
             │ Process         920
  0x200072B0 ┼───────────────────────────────────────────
             │ ▼ Grant           0
  0x200072B0 ┼───────────────────────────────────────────
             │ Unused
  0x20007020 ┼───────────────────────────────────────────
             │ ▲ Heap            ? |      ?               S
  ?????????? ┼─────────────────────────────────────────── R
             │ Data              ? |      ?               A
  ?????????? ┼─────────────────────────────────────────── M
             │ ▼ Stack           ? |      ?
  0x20007000 ┼───────────────────────────────────────────
             │ Unused
  0x20007000 ┴───────────────────────────────────────────
             .....
  0x00041000 ┬─────────────────────────────────────────── F
             │ App Flash      4016                        L
  0x00040050 ┼─────────────────────────────────────────── A
             │ Protected        80                        S
  0x00040000 ┴─────────────────────────────────────────── H

with this pr:

tock$ process bare_bones
𝐀𝐩𝐩: bare_bones   -   [Yielded]
 Events Queued: 0   Syscall Count: 1   Dropped Upcall Count: 0
 Restart Count: 0
 Last Syscall: Yield { which: 1, address: 0x20007000 }
 Completion Code: None
 Credential: None


 ╔═══════════╤══════════════════════════════════════════╗
 ║  Address  │ Region Name    Used | Allocated (bytes)  ║
 ╚0x20007800═╪══════════════════════════════════════════╝
             │ Grant Ptrs      120
             │ Upcalls         320
             │ Process         920
  0x200072B0 ┼───────────────────────────────────────────
             │ ▼ Grant           0
  0x200072B0 ┼───────────────────────────────────────────
             │ Unused
  0x20007020 ┼───────────────────────────────────────────
             │ ▲ Heap            ? |      ?               S
  ?????????? ┼─────────────────────────────────────────── R
             │ Data              ? |      ?               A
  ?????????? ┼─────────────────────────────────────────── M
             │ ▼ Stack           ? |      ?
  0x20007000 ┼───────────────────────────────────────────
             │ Unused
  0x20007000 ┴───────────────────────────────────────────
             .....
  0x00041000 ┬─────────────────────────────────────────── F
             │ App Flash      4016                        L
  0x00040050 ┼─────────────────────────────────────────── A
             │ Protected        80                        S
  0x00040000 ┴─────────────────────────────────────────── H
tock$

Looks good to me!

@bradjc bradjc added last-call Final review period for a pull request. P-Significant This is a substancial change that requires review from all core developers. labels Jul 31, 2023
@lschuermann lschuermann dismissed stale reviews from bradjc and ppannuto via a3fc554 August 1, 2023 13:21
@lschuermann
Copy link
Member Author

I just realized I never put the above diagram into the code. Sorry for invalidating your reviews (again) @bradjc @ppannuto.

@ppannuto ppannuto added this pull request to the merge queue Aug 1, 2023
Merged via the queue into tock:master with commit f843d22 Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel last-call Final review period for a pull request. P-Significant This is a substancial change that requires review from all core developers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0