-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-compile each layer individually #492
Pre-compile each layer individually #492
Conversation
@jsturtevant I have tested these changes out with a branch of the Spin containerd shim with pre-compilation support and it works great. I've tested Spin apps with multiple components and static assets, as well. To me, this is looking ready to take out of draft. Thank you for expanding the implementation! |
@jsturtevant through more testing, I did find some odd behavior. After pulling an app, I can rerun the shim over and over and get expected behavior (reuses precompiled components); however, if i repull the image ( Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T21:17:51.312229693Z" level=info msg="Shim successfully started, waiting for exit signal..."
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T21:17:51.317710774Z" level=info msg="found manifest with WASM OCI image format"
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T21:17:51.318314715Z" level=info msg="layer sha256:35dd7338635af15592327986c45e75efc2b258f1d92e4282ced7668d827e5741 has pre-compiled content: sha256:8d7374edd04e514d5c7b35a1b4a2a334716e1bcae596dfc0944e23b921bf6cc6 "
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T21:17:51.323980565Z" level=info msg="layer sha256:65bc286f8315746d1beecd2430e178f539fa487ebf6520099daae09a35dbce1d has pre-compiled content: sha256:dabc42e0d769727d23126b7908e0ea9dd4a93a7cd8867cb36615cbc990776ba8 "
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T21:17:51.335170453Z" level=info msg="precompiling layers for image: ghcr.io/kate-goldenring/spin-hello-and-kv-explorer:latest"
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T21:17:51.335195337Z" level=info msg="Precompiling layer with Spin Engine: "sha256:35dd7338635af15592327986c45e75efc2b258f1d92e4282ced7668d827e5741""
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T21:17:51.335225245Z" level=info msg="Precompiling layer "sha256:35dd7338635af15592327986c45e75efc2b258f1d92e4282ced7668d827e5741""
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T13:17:51.502392136-08:00" level=info msg="shim disconnected" id=two namespace=default
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T13:17:51.502458979-08:00" level=warning msg="cleaning up after shim disconnected" id=two namespace=default
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T13:17:51.502475509-08:00" level=info msg="cleaning up dead shim" namespace=default
Feb 23 13:17:51 kagold-ThinkPad-X1-Carbon-6th containerd[496568]: time="2024-02-23T13:17:51.507305203-08:00" level=error msg="copy shim log" error="read /proc/self/fd/13: file already closed" namespace=default |
@kate-goldenring did anything in the image change that caused the image sha to change? |
@jsturtevant the layers of the image are the same to fn precompile(&self, layer: &WasmLayer) -> Option<Result<Vec<u8>>> {
log::info!(
"Precompiling layer with Spin Engine: {:?}",
layer.config.digest()
);
match layer.config.media_type() {
MediaType::Other(name) => {
log::info!("Precompiling layer {:?}", layer.config.digest());
if name == "application/vnd.wasm.content.layer.v1+wasm" {
if let Some(_) = self.wasmtime_engine.detect_precompiled(&layer.layer) {
log::info!("Already precompiled");
return None
}
let component =
spin_componentize::componentize_if_necessary(&layer.layer).unwrap();
Some(self.wasmtime_engine.precompile_component(&component))
} else {
None
}
}
_ => None,
}
} |
ah, I see so we are passing the precompiled bits to the engine to recompile. Makes sense it will crash, Let me see if we can fix it up in the shim library code but being defensive here might not be a bad option. |
@jsturtevant to give more context, when evaluating whether to recompile( |
@kate-goldenring As I've been playing and thinking about the API I think a better API for this would be
with the intent to evolve it eventually to make it more flexible
This would allow for runtimes like spin to be able to compile and return layers they care about and runtimes that support components to be able to compose and compile a single layer. Thoughts? I recognize this is a change for the tests you've been running. The change is pretty minimal on the shim implementor side. The change to |
|
To @kate-goldenring's point:
Any thouhgts why this would be happening? Nothing about the image changes in between pulls. |
looking into this today |
@jsturtevant I like this idea of generalizing |
I am seeing the label on the image disappear when using |
64ebc9b
to
8b051f4
Compare
I've stored the "pre-compiled" flag on the content instead of the image, since the image is mutable (I think @cpuguy83 pointed this out previously but I didn't fully understand the implications at the time). Note that if runwasi detects that one or more of the pre-compiled components are removed from the cache it will re-request a compilation so its good to keep the check for compilation in the shim. I don't really know a valid scenario where this would happen besides a user going in and deleting it manually. I will see if we can come up with a way to handle this when we improve the api towards |
The image reference (which we read from the container object) is mutable, is what I was referring to. |
8b051f4
to
18f1e99
Compare
I was looking into the why the test failed here, I was able to reproduce it locally although flaky and identified that it is due to the way the test is importing images. In the failed test the image gets imported twice, I expected this to be a no-op but the annotations in the image are stored in a hashmap which results in a different image digest causing the flake (since the flag isn't found on the content but the individual layers are the same). This comes back to my comment in #492 (comment) which is to say that we are not passing information around if we know whether it is compiled to the shim today but can possible address it with a better API. |
73c80cc
to
f654ba8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsturtevant I just have some preliminary questions on the precompile
API
let compiled_layers = compiled_layer_result?; | ||
|
||
for (i, precompiled_layer) in compiled_layers.iter().enumerate() { | ||
let original_layer = &layers[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line assumes that all N layers are returned even if only M were precompiled (say a few layers were not wasm). We should find a way to support precompiling multiple layers but also support engine implementations that support non wasm layers (such as the Spin one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I want avoid handling the situation where multple layers are compiled to 1. I think this is going to need some bigger changes and would like the design to evolve it to a different api like process_layers
or initialize_layers
with more input for folks. If I use Result<Vec<Option<WasmLayer>>>
as you suggested, I can avoid looking up the layer here and just skip if None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#504 for tracking updates
Signed-off-by: James Sturtevant <jstur@microsoft.com>
Signed-off-by: James Sturtevant <jstur@microsoft.com>
…e can change Signed-off-by: James Sturtevant <jstur@microsoft.com>
Signed-off-by: James Sturtevant <jstur@microsoft.com>
Signed-off-by: James Sturtevant <jstur@microsoft.com>
Signed-off-by: James Sturtevant <jstur@microsoft.com>
Signed-off-by: James Sturtevant <jstur@microsoft.com>
Signed-off-by: James Sturtevant <jstur@microsoft.com>
abf1f3a
to
215f503
Compare
Signed-off-by: James Sturtevant <jstur@microsoft.com>
Co-authored-by: Kate Goldenring <kate.goldenring@gmail.com> Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jsturtevant for all the iterations of this. This really speeds up the performance of the containerd shim being able to precompile all layers in one call and preserve those precompiled layers.
I pushed one last change based on your feedback. Thanks for trying out all the changes, the feedback and being patient as I adjusted the API based on the usage |
Signed-off-by: James Sturtevant <jstur@microsoft.com>
03bc3d2
to
d2788c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two doc nits
Co-authored-by: Kate Goldenring <kate.goldenring@gmail.com> Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
I may have bumped into an issue that will turn out a blocker for the current approach — while running a Wasm component built with SpiderMonkey, the resulting precompiled component size is larger than the maximum message size in gRPC, which means saving it to the content store fails:
Is there a way where |
Is the message not getting chunked? If not, perhaps, that is the path we should pursue. |
it looks like the write failed when saving the new content. Will need to implement streaming for larger content when doing the write. Right now it writes and commits in one action: runwasi/crates/containerd-shim-wasm/src/sandbox/containerd/client.rs Lines 198 to 201 in 2171b3d
I don't think this is a blocker on this PR and I would be inclined to merge this and make those changes separately as this changeset already has quite a bit going on. |
Chunking sounds great, and I agree that we can do this in a follow-up. Thanks for the quick update! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR. Most of my comments are non-blocking refactoring tips.
/// The cached, precompiled layers will be reloaded on subsequent runs. | ||
/// The runtime is expected to return the same number of layers passed in, if the layer cannot be precompiled it should return `None` for that layer. | ||
/// In some edge cases it is possible that the layers may already be precompiled and None should be returned in this case. | ||
fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<Option<Vec<u8>>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: type alias for Vec<Option<Vec<u8>>>>
, perhaps PrecompiledWasmLayers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit to counter the nit with: not everything returned is a precompiled layer, which is why there is an option wrapping, so it feels like a confusing name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe Vec<Option<WasmLayer>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WasmLayer
is already a structure. The API was that at one point but was changed to just return the layer content rather than the content and config because the original config is used as the source of truth. I think we should maybe leave it as is or PrecompiledLayer
could work, though i prefer not using types to alias something as small as Vec<u8>
@@ -375,6 +347,15 @@ impl Client { | |||
}) | |||
} | |||
|
|||
fn get_image_manifest(&self, image_name: &str) -> Result<(ImageManifest, String)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn get_image_manifest(&self, image_name: &str) -> Result<(ImageManifest, String)> { | |
fn get_image_manifest_and_digest(&self, image_name: &str) -> Result<(ImageManifest, String)> { |
let manifest = self.read_content(&image_digest)?; | ||
let manifest = manifest.as_slice(); | ||
let manifest = ImageManifest::from_reader(manifest)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let manifest = self.read_content(&image_digest)?; | |
let manifest = manifest.as_slice(); | |
let manifest = ImageManifest::from_reader(manifest)?; | |
let manifest = ImageManifest::from_reader(self.read_content(&image_digest)?.as_slice())?; | |
let layers = manifest | ||
.layers() | ||
.iter() | ||
.filter(|x| is_wasm_layer(x.media_type(), T::supported_layers_types())) | ||
.map(|config| self.read_content(config.digest())) | ||
.map(|original_config| { | ||
let mut digest_to_load = original_config.digest().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor into it's own function?
let info = self.get_info(&digest_to_load)?; | ||
if info.labels.contains_key(&precompile_id) { | ||
// Safe to unwrap here since we already checked for the label's existence | ||
digest_to_load = info.labels.get(&precompile_id).unwrap().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let info = self.get_info(&digest_to_load)?; | |
if info.labels.contains_key(&precompile_id) { | |
// Safe to unwrap here since we already checked for the label's existence | |
digest_to_load = info.labels.get(&precompile_id).unwrap().clone(); | |
if let Some(precompiled_digest) = info.labels.get(&precompile_id) { | |
... | |
} |
match engine.precompile(&layers) { | ||
Ok(compiled_layers) => { | ||
if compiled_layers.len() != layers.len() { | ||
return Err(ShimError::FailedPrecondition( | ||
"precompile returned wrong number of layers".to_string(), | ||
)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to imporve readability, consider
let compiled_layers = match engine.precompile(&layers) {
Ok(compiled_layers) => {
if compiled_layers.len() != layers.len() {
return Err(ShimError::FailedPrecondition(
"precompile returned wrong number of layers".to_string(),
));
}
compiled_layers
}
Err(e) => {
log::error!("precompilation failed: {}", e);
return Err(ShimError::FailedPrecondition("precompilation failed".to_string()));
}
};
}], | ||
platform, | ||
)); | ||
if layers.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this condition check to right after layers
is decalred.
@@ -542,4 +580,370 @@ mod tests { | |||
.read_content(expected) | |||
.expect_err("content should not exist"); | |||
} | |||
|
|||
#[test] | |||
fn test_layers_when_precompile_not_supported() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding so much test cases. Love it 😍
} | ||
|
||
#[test] | ||
fn test_layers_are_precompiled_for_multiple_layers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a case where layers are empty, or all layers contain no Wasm type?
Okay, going to merge this and figure out the comments later. |
this is to resolve my comments made in PR containerd#492 in an effort to make the code a bit more idiomatic and readable Signed-off-by: jiaxiao zhou <jiazho@microsoft.com>
this is to resolve my comments made in PR containerd#492 in an effort to make the code a bit more idiomatic and readable Signed-off-by: jiaxiao zhou <jiazho@microsoft.com>
this is to resolve my comments made in PR containerd#492 in an effort to make the code a bit more idiomatic and readable Signed-off-by: jiaxiao zhou <jiazho@microsoft.com>
After some feedback we found some runtimes require ability to individually pre-compile layers. This attempts to address that ability.
This also fixes a bug that was introduced when adding pre-compilation where the media types were being overwritten.