8000 Allow OCI for multi-node/multi-gpu by israel-hdez · Pull Request #4441 · kserve/kserve · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Allow OCI for multi-node/multi-gpu #4441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

israel-hdez
8000
Copy link
Contributor

What this PR does / why we need it:

This modifies the InferenceService validation of the multi-node/multi-gpu case to allow OCI in the storageUri additionally to the already allowed PVC storage.

Type of changes
Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A
  • Test B

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

@israel-hdez israel-hdez force-pushed the j24536-oci-for-multi-gpu branch 4 times, most recently from eb327b1 to c71b051 Compare May 7, 2025 22:57
@israel-hdez israel-hdez marked this pull request as ready for review May 7, 2025 22:57
@israel-hdez
Copy link
Contributor Author

@Jooho May you review?

@israel-hdez israel-hdez force-pushed the j24536-oci-for-multi-gpu branch from c71b051 to 882a415 Compare May 13, 2025 22:53
@israel-hdez israel-hdez requested a review from Jooho May 13, 2025 22:54
@israel-hdez israel-hdez force-pushed the j24536-oci-for-multi-gpu branch 2 times, most recently from 6787dbb to 1d97b50 Compare May 15, 2025 18:38
Copy link
Contributor
@Jooho Jooho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@israel-hdez israel-hdez force-pushed the j24536-oci-for-multi-gpu branch from 1d97b50 to f268af7 Compare May 15, 2025 22:08
@israel-hdez
Copy link
Contributor Author

/rerun-all

spolti and others added 15 commits May 29, 2025 10:23
Signed-off-by: Spolti <fspolti@redhat.com>
Signed-off-by: Andres Llausas <allausas@redhat.com>
Co-authored-by: Andres Llausas <allausas@redhat.com>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Sivanantham <90966311+sivanantha321@users.noreply.github.com>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
…quests (kserve#4482)

Signed-off-by: Sivanantham Chinna
8000
iyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
…ention mechanisms (kserve#4495)

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
…ring reconciliation (kserve#4471)

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
…4496)

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Jin Dong <jdong183@bloomberg.net>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
This modifies the InferenceService validation of the multi-node/multi-gpu case to allow OCI in the storageUri additionally to the already allowed PVC storage.

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
israel-hdez and others added 3 commits May 29, 2025 10:23
Allow setting the MODEL_DIR environment variable for OCI storage
protocol.

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
@israel-hdez israel-hdez force-pushed the j24536-oci-for-multi-gpu branch from 854354a to 7ae875b Compare May 29, 2025 16:25
export MODEL_DIR_ARG=""
if [[ ! -z ${MODEL_ID} ]]
then
export MODEL_DIR_ARG="--model_dir=${MODEL_ID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if using model_id to dynamically download models from HuggingFace is the best approach, especially for multi-node/multi-GPU setups. In such cases, the models are typically large, and downloading them on each node can take significant time.

This concern is somewhat outside the main scope of this PR, so it might make sense to address it in a separate PR.
However, since this PR already touches that part of the code, it would be great if we could improve it here :)

Also, I suggest using a generic variable like MODEL_ARG, since --model_id and --model_dir are mutually exclusive

parser.add_argument(
    "--model_dir",
    required=False,
    default="/mnt/models",
    help="A URI pointer to the model binary",
)
parser.add_argument(
    "--model_id", required=False, default=None, help="Huggingface model id"
)

https://github.com/kserve/kserve/blob/master/python/huggingfaceserver/huggingfaceserver/__main__.py#L79C1-L87C2

Like the following

  export MODEL_ARG=""
  if [[ ! -z ${MODEL_ID} ]]
        then
          export MODEL_ARG="--model_id=${MODEL_ID}"

   if [[ ! -z ${MODEL_DIR} ]]
        then
          export MODEL_ARG="--model_dir=${MODEL_DIR}"
        fi

python -m huggingfaceserver ${MODEL_ARG} --tensor-parallel-size=${TENSOR_PARALLEL_SIZE} --pipeline-parallel-size=${PIPELINE_PARALLEL_SIZE} $0 $@

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
0