Open
Description
/kind bug
What steps did you take and what happened:
When deploying the kserve/huggingfaceserver:latest-gpu
image in Kubernetes, the container fails to start due to a CUDA version mismatch. The error indicates the container requires CUDA >=12.8, but the host driver doesn't meet this requirement.
Error Log:
Normal Created 115s kubelet Created container storage-initializer
Normal Started 115s kubelet Started container storage-initializer
Warning Failed 62s (x4 over 105s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.8, please update your driver to a newer version, or use an earlier cuda container: unknown
Warning BackOff 29s (x8 over 104s) kubelet Back-off restarting failed container kserve-container in pod dbp-9fd2968b-009d-4d9e-9d30-32b9511a1e8e-predictor-7f4b8d4dx6q5_default(4dd91701-b2ab-4441-ae75-dba42264ef38)
What's the InferenceService yaml:
[To help us debug please run kubectl get isvc $name -n $namespace -oyaml
and paste the output]
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
- Istio Version:
- Knative Version:
- KServe Version:
- Kubeflow version:
- Cloud Environment:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
- Minikube/Kind version:
- Kubernetes version: (use
kubectl version
): - OS (e.g. from
/etc/os-release
):
Metadata
Metadata
Assignees
Labels
No labels