Allow OCI for multi-node/multi-gpu #4441

israel-hdez · 2025-05-03T01:16:21Z

What this PR does / why we need it:

This modifies the InferenceService validation of the multi-node/multi-gpu case to allow OCI in the storageUri additionally to the already allowed PVC storage.

Type of changes
Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Test B

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Release note:

Re-running failed tests

/rerun-all - rerun all failed workflows.
/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

israel-hdez · 2025-05-07T22:57:45Z

@Jooho May you review?

config/runtimes/kserve-huggingfaceserver-multinode.yaml

pkg/apis/serving/v1beta1/component.go

Jooho

/lgtm

israel-hdez · 2025-05-16T15:00:17Z

/rerun-all

Signed-off-by: Spolti <fspolti@redhat.com> Signed-off-by: Andres Llausas <allausas@redhat.com> Co-authored-by: Andres Llausas <allausas@redhat.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Signed-off-by: tarilabs <matteo.mortari@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Sivanantham <90966311+sivanantha321@users.noreply.github.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

…quests (kserve#4482) Signed-off-by: Sivanantham Chinna 8000 iyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

…ention mechanisms (kserve#4495) Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

…ring reconciliation (kserve#4471) Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

…4496) Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Signed-off-by: Jin Dong <jdong183@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Signed-off-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

This modifies the InferenceService validation of the multi-node/multi-gpu case to allow OCI in the storageUri additionally to the already allowed PVC storage. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Allow setting the MODEL_DIR environment variable for OCI storage protocol. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Jooho · 2025-05-30T01:45:55Z

config/runtimes/kserve-huggingfaceserver-multinode.yaml

+        export MODEL_DIR_ARG=""
+        if [[ ! -z ${MODEL_ID} ]]
+        then
+          export MODEL_DIR_ARG="--model_dir=${MODEL_ID}"


I'm not sure if using model_id to dynamically download models from HuggingFace is the best approach, especially for multi-node/multi-GPU setups. In such cases, the models are typically large, and downloading them on each node can take significant time.

This concern is somewhat outside the main scope of this PR, so it might make sense to address it in a separate PR.
However, since this PR already touches that part of the code, it would be great if we could improve it here :)

Also, I suggest using a generic variable like MODEL_ARG, since --model_id and --model_dir are mutually exclusive

parser.add_argument( "--model_dir", required=False, default="/mnt/models", help="A URI pointer to the model binary", ) parser.add_argument( "--model_id", required=False, default=None, help="Huggingface model id" )

https://github.com/kserve/kserve/blob/master/python/huggingfaceserver/huggingfaceserver/__main__.py#L79C1-L87C2

Like the following

export MODEL_ARG="" if [[ ! -z ${MODEL_ID} ]] then export MODEL_ARG="--model_id=${MODEL_ID}" if [[ ! -z ${MODEL_DIR} ]] then export MODEL_ARG="--model_dir=${MODEL_DIR}" fi python -m huggingfaceserver ${MODEL_ARG} --tensor-parallel-size=${TENSOR_PARALLEL_SIZE} --pipeline-parallel-size=${PIPELINE_PARALLEL_SIZE} $0 $@

israel-hdez force-pushed the j24536-oci-for-multi-gpu branch 4 times, most recently from eb327b1 to c71b051 Compare May 7, 2025 22:57

israel-hdez marked this pull request as ready for review May 7, 2025 22:57

Jooho reviewed May 8, 2025

View reviewed changes

config/runtimes/kserve-huggingfaceserver-multinode.yaml Outdated Show resolved Hide resolved

pkg/apis/serving/v1beta1/component.go Show resolved Hide resolved

israel-hdez force-pushed the j24536-oci-for-multi-gpu branch from c71b051 to 882a415 Compare May 13, 2025 22:53

israel-hdez requested a review from Jooho May 13, 2025 22:54

israel-hdez force-pushed the j24536-oci-for-multi-gpu branch 2 times, most recently from 6787dbb to 1d97b50 Compare May 15, 2025 18:38

Jooho reviewed May 15, 2025

View reviewed changes

israel-hdez force-pushed the j24536-oci-for-multi-gpu branch from 1d97b50 to f268af7 Compare May 15, 2025 22:08

israel-hdez mentioned this pull request May 16, 2025

Allow OCI for multi-node/multi-gpu opendatahub-io/kserve#618

Merged

1 task

spolti and others added 15 commits May 29, 2025 10:23

Fixes CVE-2025-43859 (kserve#4468)

90ed8a7

Signed-off-by: Spolti <fspolti@redhat.com> Signed-off-by: Andres Llausas <allausas@redhat.com> Co-authored-by: Andres Llausas <allausas@redhat.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

fix: huggingface e2e test output mismatch and add tests for stream re…

d34d7f3

…quests (kserve#4482) Signed-off-by: Sivanantham Chinna 8000 iyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

docs: enhance security documentation with detailed reporting and prev…

53c86f9

…ention mechanisms (kserve#4495) Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

fix: update workflow to use ubuntu-latest for rerun PR tests (kserve#…

0acf561

…4496) Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Add predictor_config to ModelServer init function (kserve#4491)

0a44982

Signed-off-by: Jin Dong <jdong183@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Generate Release 0.15.2 (kserve#4497)

2347449

Signed-off-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Remove 'default' suffix compatibility (kserve#4178)

38d32a1

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

deps: Upgrade Torch to v2.6.0 everywhere (kserve#4450)

fc81ab3

Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Drop Pydantic v1 support (kserve#4353)

1d16da5

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

fix: Update TextIteratorStreamer to skip special tokens (kserve#4490)

f4a6529

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Allow OCI for multi-node/multi-gpu

db99400

This modifies the InferenceService validation of the multi-node/multi-gpu case to allow OCI in the storageUri additionally to the already allowed PVC storage. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Fix feedback: Jooho

677b2b0

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Merge branch 'master' into j24536-oci-for-multi-gpu

52805dc

israel-hdez and others added 3 commits May 29, 2025 10:23

Merge branch 'master' into j24536-oci-for-multi-gpu

29f5f49

Fix issue with OCI support for multi-node/multi-gpu

2089a69

Allow setting the MODEL_DIR environment variable for OCI storage protocol. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>

Merge branch 'master' into j24536-oci-for-multi-gpu

7ae875b

israel-hdez force-pushed the j24536-oci-for-multi-gpu branch from 854354a to 7ae875b Compare May 29, 2025 16:25

Merge branch 'master' into j24536-oci-for-multi-gpu

83f56ab

Jooho reviewed May 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow OCI for multi-node/multi-gpu #4441

Allow OCI for multi-node/multi-gpu #4441

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Allow OCI for multi-node/multi-gpu #4441

Are you sure you want to change the base?

Allow OCI for multi-node/multi-gpu #4441

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!