8000 Switch to distroless Base image by elezar · Pull Request #1154 · NVIDIA/nvidia-container-toolkit · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Switch to distroless Base image #1154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

elezar
Copy link
8000
Member
@elezar elezar commented Jun 18, 2025

This change switches to the nvcr.io/nvidia/distroless/go:v3.1.9-dev distroless go image for both the application image and the packaging image.

@elezar elezar requested review from tariq1890 and cdesiniotis June 18, 2025 10:07
@elezar elezar self-assigned this Jun 18, 2025
@coveralls
Copy link
coveralls commented Jun 18, 2025

Pull Request Test Coverage Report for Build 15753213707

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 33.644%

Totals Coverage Status
Change from base Build 15744214901: 0.0%
Covered Lines: 4366
Relevant Lines: 12977

💛 - Coveralls

@elezar elezar force-pushed the switch-to-distroless branch from ea3b2ed to 67e0b1c Compare June 18, 2025 10:17
@elezar elezar changed the title Switch to distroless Switch to distroless Base image Jun 18, 2025
@elezar elezar force-pushed the switch-to-distroless branch 2 times, most recently from 27605cd to bccb7d8 Compare June 18, 2025 12:15
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the switch-to-distroless branch 3 times, most recently from 6998d23 to 9b15d1d Compare June 18, 2025 13:12
@elezar elezar marked this pull request as ready for review June 18, 2025 13:37
@elezar elezar added this to the v1.18.0 milestone Jun 18, 2025
FROM nvcr.io/nvidia/cuda:12.9.0-base-ubi9
# The application stage contains the application used as a GPU Operator
# operand.
FROM nvcr.io/nvidia/distroless/go:v3.1.9-dev AS application
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the shell included in the -dev tags is located at /busybox/sh. I would recommend creating a symlink at /bin/sh so that in the operator we can use #! /bin/sh as the shebang for the entrypoint script. By using /bin/sh we remain backwards compatible with older toolkit images that are not built on distroless. We have tested this with other operands, e.g. https://github.com/NVIDIA/k8s-kata-manager/blob/f58e4dad0695043a545b17e3e159e24828816a62/deployments/container/Dockerfile#L50-L51

Suggested change
FROM nvcr.io/nvidia/distroless/go:v3.1.9-dev AS application
FROM nvcr.io/nvidia/distroless/go:v3.1.9-dev AS application
SHELL ["/busybox/sh", "-c"]
RUN ln -s /busybox/sh /bin/sh

Copy link
Contributor
@cdesiniotis cdesiniotis Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we explicitly set USER 0:0 in the Dockerfile as the default user in distroless is uid 1000? I assume the toolkit requires running as root (for restarting containerd).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the shell tip. Will update.

I'm not sure on the user preference. Does the GPU Operator not set the user in general? Would using the current user (USER 1000:1000) not be more "compliant"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the user to 0:0 below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on the user preference. Does the GPU Operator not set the user in general? Would using the current user (USER 1000:1000) not be more "compliant"?

The GPU Operator does not explicitly set the runAsUser / runAsGroup fields when deploying Daemonsets, so we currently depend on the user / group defined in the image itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have performed a quick sanity check of the toolkit image built in this PR. Looks good.

Besides having to change the entrypoint script to be a POSIX shell script (and not bash), I also had to change https://github.com/NVIDIA/gpu-operator/blob/6324d2aca562edf46d93cbf9d2a0837ab5c12e59/assets/state-container-toolkit/0400_configmap.yaml#L34 from

exec nvidia-toolkit

to

exec nvidia-ctk-installer

I see the name of the executable has changed. This is a breaking change that will need to be made when we bump the version of the toolkit to 1.18.0 in the GPU Operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised NVIDIA/gpu-operator#1496 which updates our entrypoint scripts in the operator to use sh instead of bash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can include an nvidia-toolkit symlink so that we maintain backward compatibility.

Also on:

I've updated the user to 0:0 below.

I had to set the user before we create the /bin/sh symlink since the default user doesn't have permissions to write to /bin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the image to include a /work/nvidia-toolkit -> /work/nvidia-ctk-installer symlink. This should allow compatibility with the GPU Operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@elezar elezar force-pushed the switch-to-distroless branch from fb7573b to e8abb58 Compare June 18, 2025 21:31
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the switch-to-distroless branch from e8abb58 to b94721c Compare June 18, 2025 21:39
elezar added 3 commits June 19, 2025 10:20
This change removes the NGC-DL-CONTAINER-LICENSE (since this
is not available in the distroless images) and includes the
repo's Apache LICENSE file in the image.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that a symlink from /work/nvidia-toolkit to
/work/nvidia-ctk-installer exists to allow GPU Operator versions
that override the entrypoint and assume nvidia-toolkit as the
original entrypoint.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the switch-to-distroless branch from b94721c to 6070681 Compare June 19, 2025 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0