-
Notifications
You must be signed in to change notification settings - Fork 589
Vision-Lang & Inference (including LoRA) #1174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
configs/recipes/vision/llama3_2_vision/inference/11b_rvllm_infer.yaml
Outdated
Show resolved
Hide resolved
@@ -40,6 +40,7 @@ def __init__( | |||
enable_prefix_caching: bool = True, | |||
gpu_memory_utilization: float = 1.0, | |||
enforce_eager: bool = True, | |||
max_num_seqs: int = 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing default value for max_num_seqs
may affect other models.
Can we define this param as None
? max_num_seqs: int = None
then do something like this in the function:
if max_num_seqs is not None:
vllm_kwargs["max_num_seqs"] = max_num_seqs
similarly to "max_lora_rank"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then override max_num_seq
in Llama VLLM inference config (example:
model_kwargs={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue discussed https://linear.app/oumi/issue/OPE-923
Adds the absolute-minimal required changes in inference configs for Vision-Lllama-based models trained with LoRA.
Specifically, for meta-llama/Llama-3.2-11B-Vision:
Native engine fully works
LoRA finetuned model responds: "2"
Vs. Original Lllama: "There are two sinks in this bathroom."
vLLM appears to not support yet MllamaForConditionalGeneration see.
num_sequences
).Note: SGLang LoRA inference is not addressed in this PR.
Towards OPE-681
Description
Related issues
Fixes # (issue)
Before submitting
Reviewers
At least one review from a member of
oumi-ai/oumi-staff
is required.