doc: add TRD: Dynamic Process Loading #4338

bradjc · 2025-02-12T17:12:58Z

Pull Request Overview

This pull request complements #3941 by adding a TRD for the new dynamic process loading architecture.

The main goal of this document is to describe what exists and how the traits are distinct from the implementation.

Testing Strategy

doc

TODO or Help Wanted

n/a

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make prepush.

doc/reference/trd-dynamic-process-loading.md

Co-authored-by: Viswajith <gsvs007@gmail.com>

brghena

A couple of comments for you. I think this mostly makes sense to me and seems pretty reasonable.

brghena · 2025-03-04T04:43:37Z

doc/reference/trd-dynamic-process-loading.md

+```rust
+trait DynamicProcessLoad {
+    /// Request kernel to load the newly stored process.
+    fn load(&self) -> Result<(), ErrorCode>;


It feels strange to me that load() doesn't have some ID or reference or something to signify which application binary is being loaded and tie it to the prior setup/write work.

I think it certainly could, and we have discussed that, but having multiple apps in flight at once is not something we have experimented with. We aren't sure what type of ID to use, and we didn't want to add additional complexity that we don't have a good way to test/motivate.

We decided to restrict to a single in-flight new process binary for now.

One thing I don't fully understand is how the "internal state" of this trait relates to the "internal state" of the DynamicBinaryStore trait. If I understand correctly, a call to load here is only valid after DynamicBinaryStoreClient::finalize_done --> Success (and the most-recently-finalized process is what is loaded)?

I don't like that there is implicit state across traits. The problem was hidden when AppLoaderCapsule exposed just the load() method (which internally would do finalize+load) — but once we separate out finalize and load, now you have the problem where UserspaceLoaderProcessA has the "update-at-midnight" policy, and sets things up a few hours early, then at 10pm UserspaceLoaderProcessB comes along and just does a quick update&load of a less-sensitive app. When midnight happens UserspaceLoaderProcessA has no way to load its new process.

If we want to avoid the questions around identifier selection, I think there needs to be one trait that encompasses binary update and loading, and a "app update session" commences with start and doesn't finish until load (or abort) is called.

That said, I like the current conceptual split across the two traits, I think it makes sense.

Looking at the app id TRD (which you certainly know better than me...), it's clear that Application Identifier is insufficient.

I might imagine an ID defined by the address in storage the app was loaded at, but that's not fully thought out and hits the 'thorny problems' identified here :/

We've been trying to do app loading for years now. Is it ok to just have a version that only supports one app loader and one new app in progress? We can always change things.

One reason there was no finalize is I see "load()" as "I'm done with this, it's in your hands now, kernel". The kernel can do things like check credentials before running the process; calling load() is not a guarantee that the app will actually execute. So, if the kernel supports policies about when a process can start running it is free to enforce them once load is called.

So why have load() at all? There is still something nice about separating storing a binary from deciding to try to run new apps. load() could be thought of as "load all", just as main.rs does today.

Most boards just use a function called load_processes() that doesn't take any sort of process identifier and that hasn't been an issue.

We've been trying to do app loading for years now. Is it ok to just have a version that only supports one app loader and one new app in progress? We can always change things.

Sure, we should just merge into one mega-trait for the time being then, and we can split out the two concepts when we figure out how to do it.

So, if the kernel supports policies about when a process can start running it is free to enforce them once load is called.
So why have load() at all? There is still something nice about separating storing a binary from deciding to try to run new apps.

Hmm.. maybe we are talking past each other in use of language here. There are

1. Get the data bits for a process into local storage. 2. Verify a process is eligible to execute on the platform (check signatures, policy, etc) 3. Set up relevant data structures to actually execute a process (copy data into ram, etc) 4. Schedule a process to begin executing

In current Tock, (1) is always 10000 done in advance; now we are adding it as "store".
(2), (3), and (4)* happen to be done monolithically by the current process standard implementation during "loading".

I had been viewing (2) as the last step of something which is responsible for "getting a new process physically onto the board". i.e., My job as the updater is done when there are bits on disk that are eligible to run on the platform. This is finalize(). Only "static" (i.e. permanent storage) resources are consumed at the completion of this step.

I had been viewing (3) as the request for the kernel to allocate space in SRAM for this process; a slot in the processes array; etc. This is when "dynamic" resources are consumed on behalf of the process — critically, these may be scarcer. This is how I had been viewing load().

In contrast, it sounds like your load() is (2+3).

In the "critical app swaps at midnight" scenario in my head, there are enough storage resources on a platform to hold versions A and B of an app, but not enough resources to simultaneously run both versions A and B. To set up the "swap", it's necessary to have B at a state where you know it is able to be run, i.e., (2) is already done.

We haven't needed this distinction before because historically we just do (1-4) at boot for all processes. Once we have DPL though, I think each of these are meaningfully separate steps.

*[yes, technically a process is created in Yielded with work in its queue which the kernel loop then switches to Running, but I think "not-(4)" would be creating a process in the Stopped state]

Most boards just use a function called load_processes() that doesn't take any sort of process identifier and that hasn't been an issue.

Sure, because that probably should've been called load_all_processes(); we've never had the capacity* (or use case) to load just one process before now.

*(publicly at least, load_process does exist in kernel::process_loading; even the async interface only allows "loading all", it kicks of discovering all processes and then loading all processes, you only get "per-process control" when you're notified that one of the suite it's working on happens to have finished)

I think this issue is already addressed in the doc. §5 talks about how we utilize the SequentialProcessLoaderMachine's capability to load the process and perform the credential check. The current checker implementation uses the ProcessBinary object which we do not create until we need to load it.

For your vision of splitting things, we need to separate the credential check from the loading, affecting how the check is performed for all processes, whether loading at boot or dynamically. This feels like a change that can happen in the future.

Currently finalize() checks if we need to write a leading padding app to preserve the linked list, and then writes one if needed. The current implementation of finalize() acts as a "I am done storing what needs to be stored" checkpoint.

Additionally, the "critical app swaps at midnight" feature cannot currently work. We do not have a mechanism to disable apps dynamically. We want this feature, but it is currently impossible, so worrying about this issue is unnecessary. I hope to add that mechanism eventually.

The dynamic process loader implements a new method load_new_process_binary() which does not discover all processes. It only discovers the process binary located at the slice of flash allocated during setup() and operates on it. If anything, I came up with an implementation where the 'identifier' of sorts was the starting address and the length of the app for the load mechanism. You can take a look here in #4335.

In this implementation, there is an optional argument for the load() method which lets an application request the kernel to load a binary stored at address X with size Y, or it could request to write an application binary with size Y and then load it. The downside to this approach is that we require the external app to know the position of the new binary, and that was not something we were entirely sure we wanted, so we scrapped the idea.

We separated the traits because while we are using the traits in conjunction in this implementation, they are two separate things, storing and loading. Therefore, it makes more sense to split them. If you look at commit 99423b0, they were a single implementation. After that, a conscious decision was made to split them because in commit 287b2d6, you'll see the optional argument for the load() mechanism. We played around with it for a bit (refer to #4335 again). It is not hard to implement how load() should happen, but it is hard to decide what goes in. Maybe we could opt in for a policy based approach as well. As of writing, the PR we did not have a clear idea, so we decided to make the whole process monolithic.

We need to split step 2, like so:

1. Get the data bits for a process into local storage. 2a. Verify a process is eligible to execute on the platform (check signatures, policy, etc) 2b. Verify a process is unique. 3. Set up relevant data structures to actually execute a process (copy data into ram, etc) 4. Schedule a process to begin executing

2a could be done in advance, most likely. 2b really must be done in conjunction with 3, because a process may not be unique by the time 3 happens. It might be safer to make finalize() 1, and load 2-4.

But I think trait separation is something that is non-obvious and harder to do later. I think if either of us were to expand dpl to more fine grained loading, or add a policy for determining if/when to load a new process, we could do it with separated traits. But if others were to take up the torch, it is not obvious we would want a new trait.

What is the minimum DynamicProcessLoad trait you would be comfortable with? Should load() take an AppId? Conceptually that would work and is probably correct. However, it is difficult to know, in general, what an app's AppId actually is, since that is up to the kernel's policy. Should load() take an opaque identifier, maybe a usize, provided by setup()? I think that could work as we could just use the starting flash address.

I think adding an argument to load() is too speculative right now. Really, someone needs to really think through the delayed update case and come up with a full design.

If the only way to get this merged is to remove DynamicProcessLoad we'll do it. Do you want the TRD to remove its existence? Should it be there in the architecture, but not in the implementation? What is the shortest path forward?

I'm going to preface this with "I am honestly unsure of the best (read: most reasonable/realistic) course of action".

What I'm realizing is that many of my concerns about process loading and are not unique to dynamic process loading. Unfortunately, trying to think deeply about a TRD and the right design for dynamic loading raised some thoughts on how we do loading generally which aren't really DPL's problem.

I think the ideal scenario would be to draft a TRD which is about process loading generally, ~~and then one on dynamic loading~~. Actually, I think dynamic loading is revealing some imperfect assumptions which came from static loading. Maybe the ideal scenario is expanding the scope of this TRD to "process loading", where dynamic is one of the options for how it is done.

Now, that's a much bigger lift, and I don't think any of that should be in the way of the working DPL implementation that we now have (indeed, we have a [static] process loading implementation without any design doc currently). I might propose moving #3941 forward ahead of this TRD. We can continue to hash out with the TRD process what an ideal design might do, and tackle some of the thornier questions like "what is the identifier for an app image". When that process is done, then we can go back and update (all) the process load implementation(s).

For the traits specifically, what makes me uncomfortable is the implicit tight coupling — i.e., it would be hard if not impossible to have the two separate traits not be implemented by the same object (because the two objects would have to synchronize the state of the "process loading session"). However, we don't have the identifier we need to cleanly decouple these right now. I think just noting that limitation in the current implementation, i.e. something like

// Conceptually, these are two separate operations. However, Tock does not currently // have a robust identifier for app images in storage, so the current implementation // couples storing a new binary and loading it into one atomic session. See #4338 // for work on enabling separate implementation of these traits. pub trait DynamicProcessStoreAndLoad: DynamicBinaryStore + DynamicProcessLoad {}

and then have SequentialDynamicBinaryStorage implement DynamicProcessStoreAndLoad.

brghena · 2025-03-04T04:45:41Z

doc/reference/trd-dynamic-process-loading.md

+There is a coupling between the `setup()` and `write()` calls. The `setup()`
+call allows the implementation to allocate the needed resources to store the
+process binary. Because the kernel requires that it reserves the resources
+required for the new binary before storing it, this method MUST be called before
+calling `write()`. An implementation is responsible for storing individual
+chunks of the process binary.


There's also coupling between setup() and load() right? As a new setup() cannot be called until either an abort() or a ?successful? load(), I think.

A new setup can be called if any of the steps fail. We release resources and make it available for a new process. However you are right in that a setup cannot be called again after write. It can only be called after:

A successful load()

abort()

Some process along the line failed and we reset the dynamic process loader

Documenting that list would be helpful.

brghena · 2025-03-04T04:47:34Z

doc/reference/trd-dynamic-process-loading.md

+ trait DynamicBinaryStore
+ trait DynamicProcessLoad
+┌────────────────────────────────┐   ┌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┐
+│                                │   ╎                                ╎
+│                              ──┼──►╎                                ╎
+│                                │   ╎                                ╎
+│ SequentialDynamicBinaryStorage │   ╎ SequentialProcessLoaderMachine ╎
+│                                │   ╎                                ╎
+│                                │   ╎                                ╎
+│                                │   ╎                                ╎
+└────────────────────────────────┘   └╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┘
+ hil::NonvolatileStorage


I don't understand what this diagram is trying to show me? What does it mean that trait DynamicBinaryStore is on top of trait DynamicProcessLoad?

They are both exposed by SequentialDynamicBinaryStorage, which itself has a reference to SequentialProcessLoaderMachine.

brghena · 2025-03-04T04:49:49Z

doc/reference/trd-dynamic-process-loading.md

+Second, `SequentialDynamicBinaryStorage` does not allow the calling capsule to
+write a portion of the first eight bytes of the process binary (where the
+`total_size` field is located). It must write the entire region, and
+`SequentialDynamicBinaryStorage` ensures those first eight bytes are correct
+(i.e., they match the size of the process binary and use a valid TBF header
+version).


Why can't the calling capsule write a portion of the first eight bytes? Does that somehow stop the implementation from ensuring that the first eight bytes are correct? Couldn't the same check occur either way?

What happens if the first 8 bytes are NOT correct? Is the write rejected?

Just an implementation decision to make the implementation simpler. It certainly could be more flexible, but that additional flexibility doesn't seem worth it in practice.

And yes, if the first 8 bytes are NOT correct, the whole process is aborted in the current implementation.

Do you need to call setup() again? Or just a new write? Documenting that would be good

I think the idea is you do not want to trust something that wants to manipulate the header, so you revoke all promises. The promise is the space allocated based on what is requested.
Two things expected from whatever is writing the binary are:

Don't lie about your size (total_size should match the size declared during setup)

Don't try to write out of the region promised to you (if an app requests for a size of 100 bytes and is assigned an address 0xN, it is illegal for the app to attempt to write data at 0xN+101)

In both of these cases, we treat it as an offender and end the contract to load a new app.

I can see the logic behind that policy. I do think it becomes important for §4.1 to have some articulation more directly of exactly when a "process loading session" is aborted. It can't be e.g. "any error", since this is agnostic to the underlying storage, and it's perfectly valid for some naive storage driver that happens to be servicing some other request to return EBUSY, in which case the write just needs to be retried later.

This sounds like something the storage implementation would return the error for via the asynchronous callback. We pass the operation's result with the upcall so the userspace application can retry if it wants.

The write() actions in the untrusted capsule and kernel capsule exist to check if the write operation is valid (checking header integrity, address integrity, if the buffer is available, etc.). Once this is cleared, we pass the write to the underlying storage mechanism driver (in this case, the NonvolatileToPages driver). Once the pass happens, we find out about the result via an async callback. Our current implementation of the nonvolatile driver only returns Ok(()), but an implementation could return other forms of errors that are passed back via the upcall. The application can retry then if it wants, and we do not abort anything.

The write() actions in the untrusted capsule and kernel capsule exist to check if the write operation is valid (checking header integrity, address integrity, if the buffer is available, etc.).

Are these all synchronous operations? i.e., is it the case that any write that would cause the whole transaction to be cancelled will receive an error back to the Command::write synchronous call? (And do all synchronous errors abort the transaction?)

What I'm trying to get to here is "how does the caller know whether an error from a write command has aborted the transaction" ?

Yes, the operations I mentioned are synchronous. They are part of the Result returned when an app requests a write operation. Any synchronous error results in abort because it is usually a violation. Other than perhaps the buffer being unavailable, I can amend the call to not abort on just ERESERVE.

lschuermann · 2025-03-05T17:48:05Z

doc/reference/trd-dynamic-process-loading.md

+    /// Initiate storing a new process binary with the specified length.
+    fn setup(&self, length: usize) -> Result<usize, ErrorCode>;
+
+    /// Store a portion of the process binary.
+    fn write_process_binary_data(&self, buffer: SubSliceMut<'static, u8>, offset: usize) -> Result<(), ErrorCode>;
+
+    /// Call to abort the setup/writing process.
+    fn abort(&self) -> Result<(), ErrorCode>;


I'll expand on this comment later when I have more time, leaving this as a bookmark for the discussion on explicitly aborting, vs. explicitly finalizing. I think that adding a finalize instead of an abort is a better interface as it's easier to reason about (esp. in an intermittent context) and allows more flexible underlying implementations, such as ones that don't flush on every call to write_process_binary_data or ones that want to perform some addl. checks before "enabling" an app in flash.

I think this was discussed elsewhere, but abort() and finalize() seem complementary. I think adding finalize() helps decouple the two traits and I will add it to the TRD.

ppannuto

Partial review, will read the rest asap

doc/reference/trd-dynamic-process-loading.md

ppannuto · 2025-03-05T17:34:25Z

doc/reference/trd-dynamic-process-loading.md

+   to be able to store and load a process of the specified size. Success or
+   failure is indicated via an upcall.


Does this return any kind of handle or indicator for which process is being setup (and equivalently, which one should be written to by write), or is the design such that an AppLoader process only has one app update in flight at a time?

(Or, is there an assumption that there is only one AppLoader process, and the platform can only have one app update in flight at a time?)

Reading more, it's more clear that there's one app update in flight from a given userspace process. We do have some capsules/interfaces which are designed to be "owned" by a single process—is that the expectation here? Or can there be multiple AppLoader processes?

Okay, this came up on the call — it sounds like the answer is in the middle: multiple apps can try to use this capsule, but only one app can have an "active" load going at once.

I think the documentation for setup() needs to make this more explicit, and state that calling the syscall "reserves" (or similar verbiage) the process loading mechanism for a given app load attempt which is then free'd on a call to close/finalize/load(). If a second process tries to start a process load while another is active, it will return EBUSY.

The design is that a userspace process can only be loading one app at a time. That seems like a reasonable tradeoff.

The current implementation of the apploader capsule only allows one process to use the apploader capsule. That also seemed like a reasonable current implementation, eg Apple gets away with only having one app store app.

However, the syscall design does not prohibit a future implementation from being more flexible (eg allowing for the google play app store and an amazon app store). Each app store can still only load one app at a time.

ppannuto · 2025-03-05T17:40:54Z

doc/reference/trd-dynamic-process-loading.md

+   This stores a portion of the process binary. Write is expected to be called
+   multiple times to store the entire process binary and there is no assumption
+   about the order the process binary is written in. Success or failure is
+   indicated via an upcall.


Suggested change

This stores a portion of the process binary. Write is expected to be called

multiple times to store the entire process binary and there is no assumption

about the order the process binary is written in. Success or failure is

indicated via an upcall.

This stores a portion of the process binary. `setup()` mus have completed

successfully before `write()` can be called (otherwise the interface will

return `ERESERVE`). Write is expected to be called multiple times to

store the entire process binary and there is no assumption about the

order the process binary is written in. Success or failure is indicated via

an upcall.

ppannuto · 2025-03-05T17:44:15Z

doc/reference/trd-dynamic-process-loading.md

+   multiple times to store the entire process binary and there is no assumption
+   about the order the process binary is written in. Success or failure is
+   indicated via an upcall.
+3. `load()`: This indicates the entire process binary has been written and the


Should we should separate concepts of close() (or finalize() I think is what Leon was getting at on the call) from load() — e.g., I might imagine a system that wants to do some kind of "switch at midnight" policy getting the new app on-board and ready to go before it actually wants to load the new app? [Ideally close()/finalize() would include verifying the signature, etc probably]

I'm still convinced that closeing / finalizeing a process binary that is being written to flash is important (e.g., to maintain some guarantees on the flash contents, such as ensuring that applications are always a well-formed linked list, and the kernel does not attempt to load a partially written application).

However, I'm fine leaving these two concepts entangled for the capsule API: presumably, there are few use-cases where applications are installing other applications into flash, and then not immediately loading them. Furthermore, a simple chip reset or power outage can restart the chip at any time, which would cause that application to be loaded, independent of whether the loader-application has explicitly requested this.

ppannuto · 2025-03-05T17:54:58Z

doc/reference/trd-dynamic-process-loading.md

+3 System Call API
+=================================
+
+The `0x10001` system call interface provides four operations:


Meta: I think what might make this clearer is changing Success or failure is indicated via an upcall. for each of these below to something more explicit, i.e.

Setup will always be followed by an upcall: - `Success` means that this process now has an active process loading session ready to write data. - `EBUSY` means another process is currently using the process loading mechanism. - `ESIZE` means the there is not sufficient storage available to load the requested process.

^ this should be exhaustive of the errors setup() can return ideally

I think that level of detail should be in the syscall doc. I intentionally avoided doing that.

Thinking about this more, why is the syscall interface in the TRD at all? I went to double-check myself before posting this, and none of the other TRDs which describe kernel behavior discuss the syscall interface (indeed, only 104 and 106 which are explicitly about syscalls discuss syscalls).

I don't think there is anything specific to app loading in the differing trust between the Userspace Application and Apploader Capsule above; that's just the normal app/capsule trust division. The document even notes later that it's not required for binary application acquisition to happen in userspace.

In practice, I think the Apploader Capsule is basically just a passthrough capsule, exposing the underlying kernel interfaces to userspace.

I think §3 should be removed from this TRD (and moved into syscall documentation).

I think the architecture is pretty important in this case. This isn't simple to do (although hopefully the description is clear now that it exists). Choosing to do what is on the surface a very low-level operation with userspace applications is, I don't think, obvious.

The interfaces are mirrored, true, but we needed to get something working. I don't think they will remain mirrored (e.g. adding decompression).

Most of our TRDs are about HILs, which are different. More complex TRDs, like AppID, have to cover more (e.g. TBF headers).

doc/reference/trd-dynamic-process-loading.md

Co-authored-by: Pat Pannuto <pat.pannuto@gmail.com>

Add finalize and clarify different operation connections.

Make it simpler.

ppannuto · 2025-03-06T00:52:58Z

doc/reference/trd-dynamic-process-loading.md

+correctly upholds the Tock Threat Model guarantees. Specifically, storing a new
+process binary must not cause existing process binaries to be "lost" or not
+loaded on future reboots of the board.


This only covers the integrity half of the threat model, I think this also either (a) needs to elucidate thoughts on confidentiality, or more simply (b), just refer to the Threat Model, i.e.:

Suggested change

correctly upholds the Tock Threat Model guarantees. Specifically, storing a new

process binary must not cause existing process binaries to be "lost" or not

loaded on future reboots of the board.

correctly upholds the Tock Threat Model guarantees.

doc/reference/trd-dynamic-process-loading.md

ppannuto · 2025-03-06T17:50:55Z

doc/reference/trd-dynamic-process-loading.md

+This interface does not mandate that the kernel capsule creates a new process.
+Implementations may include a policy for choosing whether to load a new process
+binary. The implementation must also use the board's chosen credential checking
+policy.


I'm a little confused by the phrasing here, the subsection opened by saying that this is

an interface for loading a stored process binary into an active Tock process.

What is the point of an interface to create an active process that is not mandated to create a process?

Is it trying to say something more like this?

Suggested change

This interface does not mandate that the kernel capsule creates a new process.

Implementations may include a policy for choosing whether to load a new process

binary. The implementation must also use the board's chosen credential checking

policy.

Even if an application image was written successfully, the request to load a process

may fail. The platform application loader may include a policy for choosing whether

to load a new process binary, which may include steps such as validating the

credentials of the newly requested process.

I think some combination. It may fail, and it must respect the credential checking policy.

I agree that a combination of those two make sense. I needed to re-read the text in the proposed TRD a couple of times to understand it; in particular the first sentence confused me. It helps that @ppannuto's version talks about "failing", before talking about the result ("not creating / loading a process").

doc/reference/trd-dynamic-process-loading.md

ppannuto · 2025-03-06T18:01:39Z

doc/reference/trd-dynamic-process-loading.md

+Second, `SequentialDynamicBinaryStorage` does not allow the calling capsule to
+write a portion of the first eight bytes of the process binary (where the
+`total_size` field is located). It must write the entire region, and
+`SequentialDynamicBinaryStorage` ensures those first eight bytes are correct
+(i.e., they match the size of the process binary and use a valid TBF header
+version).


So this means that an invalid write can (effectively, literally?) implicitly call abort()?

I think the details of that needs to be explained in §4.1... would you get the abort_done callback in response to a write? Or would you have to know that there are error codes from write_done that are semantically equivalent to receiving abort_done?

ppannuto · 2025-03-06T18:02:21Z

doc/reference/trd-dynamic-process-loading.md

+Second, `SequentialDynamicBinaryStorage` does not allow the calling capsule to
+write a portion of the first eight bytes of the process binary (where the
+`total_size` field is located). It must write the entire region, and
+`SequentialDynamicBinaryStorage` ensures those first eight bytes are correct
+(i.e., they match the size of the process binary and use a valid TBF header
+version).


This all seems complicated — wouldn't it be easier to just reject that one invalid write request, but otherwise leave things unchanged?

Co-authored-by: Pat Pannuto <pat.pannuto@gmail.com>

bradjc · 2025-03-13T14:48:48Z

For documenting process loading, does that mean describing these two traits:

tock/kernel/src/process_loading.rs

Lines 428 to 454 in 1eccd0b

    
           pub trait ProcessLoadingAsyncClient { 
        
               /// A process was successfully found in flash, checked, and loaded into a 
        
               /// `ProcessStandard` object. 
        
               fn process_loaded(&self, result: Result<(), ProcessLoadError>); 
        
               /// There are no more processes in flash to be loaded. 
        
               fn process_loading_finished(&self); 
        
           } 
        
           /// Asynchronous process loading. 
        
           /// 
        
           /// Machines which implement this trait perform asynchronous process loading and 
        
           /// signal completion through `ProcessLoadingAsyncClient`. 
        
           /// 
        
           /// Various process loaders may exist. This includes a loader from a MCU's 
        
           /// integrated flash, or a loader from an external flash chip. 
        
           pub trait ProcessLoadingAsync<'a> { 
        
               /// Set the client to receive callbacks about process loading and when 
        
               /// process loading has finished. 
        
               fn set_client(&self, client: &'a dyn ProcessLoadingAsyncClient); 
        
               /// Set the credential checking policy for the loader. 
        
               fn set_policy(&self, policy: &'a dyn AppIdPolicy); 
        
               /// Start the process loading operation. 
        
               fn start(&self); 
        
           }

?

I'm unsure what other questions you are thinking of that would help to have answered in a TRD.

lschuermann

Did a fresh, in-depth pass over the proposal, modulo Section 5. It looks good to me generally, apart from the existing comments and the new ones I added. In particular, I share the concerns about the implicit coupling of storing and loading (through two different traits), as raised by @brghena here:

https://github.com/tock/tock/pull/4338/files#r1978601588

lschuermann · 2025-04-02T16:03:31Z

doc/reference/trd-dynamic-process-loading.md

+   multiple times to store the entire process binary and there is no assumption
+   about the order the process binary is written in. Success or failure is
+   indicated via an upcall.
+3. `load()`: This indicates the entire process binary has been written and the


I'm still convinced that closeing / finalizeing a process binary that is being written to flash is important (e.g., to maintain some guarantees on the flash contents, such as ensuring that applications are always a well-formed linked list, and the kernel does not attempt to load a partially written application).

However, I'm fine leaving these two concepts entangled for the capsule API: presumably, there are few use-cases where applications are installing other applications into flash, and then not immediately loading them. Furthermore, a simple chip reset or power outage can restart the chip at any time, which would cause that application to be loaded, independent of whether the loader-application has explicitly requested this.

lschuermann · 2025-04-02T16:05:02Z

doc/reference/trd-dynamic-process-loading.md

+
+The `abort()` call deallocates any stored resources and resets the
+implementation to handle a new `setup()` call.
+`abort()` can only be called after `setup()` and before `finalize()`.


Do we want to expand on what happens when a call to finalize fails? Are the resources deallocated, or can we not provide any guarantees in that case?

lschuermann · 2025-04-02T16:09:28Z

doc/reference/trd-dynamic-process-loading.md

+This interface does not mandate that the kernel capsule creates a new process.
+Implementations may include a policy for choosing whether to load a new process
+binary. The implementation must also use the board's chosen credential checking
+policy.


I agree that a combination of those two make sense. I needed to re-read the text in the proposed TRD a couple of times to understand it; in particular the first sentence confused me. It helps that @ppannuto's version talks about "failing", before talking about the result ("not creating / loading a process").

doc: add TRD: dynamic process loading

df6d00c

github-actions bot added the documentation label Feb 12, 2025

viswajith-g reviewed Feb 13, 2025

View reviewed changes

doc/reference/trd-dynamic-process-loading.md Show resolved Hide resolved

viswajith-g reviewed Feb 13, 2025

View reviewed changes

doc/reference/trd-dynamic-process-loading.md Outdated Show resolved Hide resolved

viswajith-g reviewed Feb 13, 2025

View reviewed changes

doc/reference/trd-dynamic-process-loading.md Outdated Show resolved Hide resolved

viswajith-g reviewed Feb 13, 2025

View reviewed changes

doc/reference/trd-dynamic-process-loading.md Outdated Show resolved Hide resolved

github-actions bot assigned brghena Feb 13, 2025

bradjc and others added 2 commits February 12, 2025 21:58

Update doc/reference/trd-dynamic-process-loading.md

9379d1f

Co-authored-by: Viswajith <gsvs007@gmail.com>

doc: dpl: clarify write() update fn names

02774da

viswajith-g approved these changes Feb 13, 2025

View reviewed changes

viswajith-g and others added 4 commits February 16, 2025 21:12

doc: dpl: added documentation for abort()

acf16d7

doc: dpl: clarified nature of method returns

0e96f7b

doc: trd: update dpl trd

e87dcc7

doc: trd: re-fix box

dffd08d

viswajith-g approved these changes Feb 26, 2025

View reviewed changes

brghena reviewed Mar 4, 2025

View reviewed changes

lschuermann reviewed Mar 5, 2025

View reviewed changes

ppannuto requested changes Mar 5, 2025

View reviewed changes

bradjc and others added 3 commits March 5, 2025 14:56

doc: trd: process loading: specify capsule name

6352a91

Co-authored-by: Pat Pannuto <pat.pannuto@gmail.com>

doc: trd: update dyn proc loading doc

d241ea9

Add finalize and clarify different operation connections.

doc: trd: dpl: rename the write() function

4cbb8ee

Make it simpler.

ppannuto reviewed Mar 6, 2025

View reviewed changes

trd: dpl: improve text

2abd515

Co-authored-by: Pat Pannuto <pat.pannuto@gmail.com>

lschuermann self-requested a review March 12, 2025 16:26

lschuermann reviewed Apr 2, 2025

View reviewed changes

		to be able to store and load a process of the specified size. Success or
		failure is indicated via an upcall.

-This interface does not mandate that the kernel capsule creates a new process.
-Implementations may include a policy for choosing whether to load a new process
-binary. The implementation must also use the board's chosen credential checking
-policy.
+Even if an application image was written successfully, the request to load a process
+may fail. The platform application loader may include a policy for choosing whether
+ to load a new process binary, which may include steps such as validating the
+ credentials of the newly requested process.

doc: add TRD: Dynamic Process Loading #4338

Are you sure you want to change the base?

doc: add TRD: Dynamic Process Loading #4338

Uh oh!

Conversation

Pull Request Overview

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment