8000 New `InferenceClient` endpoint type by nsarrazin · Pull Request #1813 · huggingface/chat-ui · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

New InferenceClient endpoint type #1813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Conversation

nsarrazin
Copy link
Collaborator
8000

This will become the default endpoint type. It already lets us enable tool calling on a lot of models where we didn't have it before.

Making sure everything works for all the prod models and switching to it in this PR.

  • meta-llama/Llama-3.3-70B-Instruct
    • Tool: ✅
    • Normal: ✅
  • Qwen/Qwen3-235B-A22B
    • Tool: ✅
    • Reasoning ✅
    • Normal: ✅
  • Qwen/Qwen2.5-72B-Instruct
    • Tool ✅
    • Normal ✅
  • CohereForAI/c4ai-command-r-plus-08-2024
    • Tool ✅
    • Normal ✅
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
    • Tool ✅
    • Reasoning ✅
    • Normal ✅
  • nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
    • Normal ✅
  • Qwen/QwQ-32B
    • Tool ✅
    • Reasoning ✅
    • Normal ✅
  • google/gemma-3-27b-it (something is wrong with the multimodal image processing)
    • Tool ❌
    • Image ❌
    • Normal ✅
  • mistralai/Mistral-Small-3.1-24B-Instruct-2503 (seems to be an endpoint config issue)
    • Tool ❌
    • Normal ❌
  • Qwen/Qwen2.5-VL-32B-Instruct (seems to be an inference proxy issue)
    • Tool ❌
    • Image ❌
    • Normal ❌
  • microsoft/Phi-4
    • Tool ✅
    • Normal ✅
  • NousResearch/Hermes-3-Llama-3.1-8B
    • Tool ✅
    • Normal ✅

@nsarrazin nsarrazin added enhancement New feature or request models This issue is related to model performance/reliability labels May 6, 2025
}

export const endpointHfInferenceParametersSchema = z.object({
type: z.literal("hfinference"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would name it either inference-providers (the name of the product) or InferenceClient (the name of the library class)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

otherwise it's confusing with provider="hf-inference"

endpointHfInferenceParametersSchema.parse(input);

const client = baseURL
? new InferenceClient(config.HF_TOKEN).endpoint(baseURL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
? new InferenceClient(config.HF_TOKEN).endpoint(baseURL)
? new InferenceClient(config.HF_TOKEN, { endpointUrl: baseURL })

@@ -380,6 +306,8 @@ const addEndpoint = (m: Awaited<ReturnType<typeof processModel>>) => ({
return endpoints.tgi(args);
case "local":
return endpoints.local(args);
case "hfinference":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case "hfinference":
case "inference-providers":

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4D0D
Labels
enhancement New feature or request models This issue is related to model performance/reliability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0