-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Allow OCI for multi-node/multi-gpu #4441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
eb327b1
to
c71b051
Compare
@Jooho May you review? |
c71b051
to
882a415
Compare
6787dbb
to
1d97b50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
1d97b50
to
f268af7
Compare
/rerun-all |
Signed-off-by: Spolti <fspolti@redhat.com> Signed-off-by: Andres Llausas <allausas@redhat.com> Co-authored-by: Andres Llausas <allausas@redhat.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Sivanantham <90966311+sivanantha321@users.noreply.github.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
…quests (kserve#4482) Signed-off-by: Sivanantham Chinna 8000 iyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
…ention mechanisms (kserve#4495) Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
…ring reconciliation (kserve#4471) Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
…4496) Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Jin Dong <jdong183@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
This modifies the InferenceService validation of the multi-node/multi-gpu case to allow OCI in the storageUri additionally to the already allowed PVC storage. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Allow setting the MODEL_DIR environment variable for OCI storage protocol. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
854354a
to
7ae875b
Compare
export MODEL_DIR_ARG="" | ||
if [[ ! -z ${MODEL_ID} ]] | ||
then | ||
export MODEL_DIR_ARG="--model_dir=${MODEL_ID}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if using model_id to dynamically download models from HuggingFace is the best approach, especially for multi-node/multi-GPU setups. In such cases, the models are typically large, and downloading them on each node can take significant time.
This concern is somewhat outside the main scope of this PR, so it might make sense to address it in a separate PR.
However, since this PR already touches that part of the code, it would be great if we could improve it here :)
Also, I suggest using a generic variable like MODEL_ARG
, since --model_id
and --model_dir
are mutually exclusive
parser.add_argument(
"--model_dir",
required=False,
default="/mnt/models",
help="A URI pointer to the model binary",
)
parser.add_argument(
"--model_id", required=False, default=None, help="Huggingface model id"
)
Like the following
export MODEL_ARG=""
if [[ ! -z ${MODEL_ID} ]]
then
export MODEL_ARG="--model_id=${MODEL_ID}"
if [[ ! -z ${MODEL_DIR} ]]
then
export MODEL_ARG="--model_dir=${MODEL_DIR}"
fi
python -m huggingfaceserver ${MODEL_ARG} --tensor-parallel-size=${TENSOR_PARALLEL_SIZE} --pipeline-parallel-size=${PIPELINE_PARALLEL_SIZE} $0 $@
What this PR does / why we need it:
This modifies the InferenceService validation of the multi-node/multi-gpu case to allow OCI in the storageUri additionally to the already allowed PVC storage.
Type of changes
Please delete options that are not relevant.
Feature/Issue validation/testing:
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Checklist:
Release note:
Re-running failed tests
/rerun-all
- rerun all failed workflows./rerun-workflow <workflow name>
- rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.