8000 Vision-Lang & Inference (including LoRA) by optas · Pull Request #1174 · oumi-ai/oumi · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Vision-Lang & Inference (including LoRA) #1174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 23, 2025
Merged

Vision-Lang & Inference (including LoRA) #1174

merged 14 commits into from
Jan 23, 2025

Conversation

optas
Copy link
Contributor
@optas optas commented Jan 18, 2025

Adds the absolute-minimal required changes in inference configs for Vision-Lllama-based models trained with LoRA.

Specifically, for meta-llama/Llama-3.2-11B-Vision:

  • Native engine fully works

    • With trained LorA adapters per Oumi's training recipe, we saw the responses of the model on questions to be very concise following the nature of the finetuning dataset. E.g., for image and prompt "How many sinks in the bathroom?"
      LoRA finetuned model responds: "2"
      Vs. Original Lllama: "There are two sinks in this bathroom."
  • vLLM appears to not support yet MllamaForConditionalGeneration see.

    • We laid some groundwork for such requests in the future (see comments inside) and verified the feasibility of inference with the full meta-llama/Llama-3.2-11B-Vision model with VLM (see parameter added num_sequences).
  • Note: SGLang LoRA inference is not addressed in this PR.

Towards OPE-681

Description

Related issues

Fixes # (issue)

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

Copy link
linear bot commented Jan 18, 2025

OPE-681

@optas optas requested review from nikg4, oelachqar and wizeng23 and removed request for nikg4 January 18, 2025 21:31
@@ -40,6 +40,7 @@ def __init__(
enable_prefix_caching: bool = True,
gpu_memory_utilization: float = 1.0,
enforce_eager: bool = True,
max_num_seqs: int = 2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing default value for max_num_seqs may affect other models.

Can we define this param as None ? max_num_seqs: int = None
then do something like this in the function:

if max_num_seqs is not None: 
    vllm_kwargs["max_num_seqs"] = max_num_seqs

similarly to "max_lora_rank"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then override max_num_seq in Llama VLLM inference config (example:

)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@optas optas changed the title Vision-Lang & LoRA Inference Vision-Lang & Inference (including LoRA) Jan 22, 2025
@optas optas marked this pull request as ready for review January 22, 2025 19:02
@optas optas merged commit 50bb300 into main Jan 23, 2025
2 checks passed
@optas optas deleted the optas/vlm-lora-inference branch January 23, 2025 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0