New `InferenceClient` endpoint type #1813

nsarrazin · 2025-05-06T12:48:53Z

8000

This will become the default endpoint type. It already lets us enable tool calling on a lot of models where we didn't have it before.

Making sure everything works for all the prod models and switching to it in this PR.

meta-llama/Llama-3.3-70B-Instruct
- Tool: ✅
- Normal: ✅
Qwen/Qwen3-235B-A22B
- Tool: ✅
- Reasoning ✅
- Normal: ✅
Qwen/Qwen2.5-72B-Instruct
- Tool ✅
- Normal ✅
CohereForAI/c4ai-command-r-plus-08-2024
- Tool ✅
- Normal ✅
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- Tool ✅
- Reasoning ✅
- Normal ✅
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
- Normal ✅
Qwen/QwQ-32B
- Tool ✅
- Reasoning ✅
- Normal ✅
google/gemma-3-27b-it (something is wrong with the multimodal image processing)
- Tool ❌
- Image ❌
- Normal ✅
mistralai/Mistral-Small-3.1-24B-Instruct-2503 (seems to be an endpoint config issue)
- Tool ❌
- Normal ❌
Qwen/Qwen2.5-VL-32B-Instruct (seems to be an inference proxy issue)
- Tool ❌
- Image ❌
- Normal ❌
microsoft/Phi-4
- Tool ✅
- Normal ✅
NousResearch/Hermes-3-Llama-3.1-8B
- Tool ✅
- Normal ✅

julien-c · 2025-05-06T13:06:43Z

src/lib/server/endpoints/hfinference/endpointHfInference.ts

+}
+
+export const endpointHfInferenceParametersSchema = z.object({
+	type: z.literal("hfinference"),


i would name it either inference-providers (the name of the product) or InferenceClient (the name of the library class)

otherwise it's confusing with provider="hf-inference"

julien-c · 2025-05-06T13:07:35Z

src/lib/server/endpoints/hfinference/endpointHfInference.ts

+		endpointHfInferenceParametersSchema.parse(input);
+
+	const client = baseURL
+		? new InferenceClient(config.HF_TOKEN).endpoint(baseURL)


Suggested change

? new InferenceClient(config.HF_TOKEN).endpoint(baseURL)

? new InferenceClient(config.HF_TOKEN, { endpointUrl: baseURL })

julien-c · 2025-05-06T13:08:01Z

src/lib/server/models.ts

@@ -380,6 +306,8 @@ const addEndpoint = (m: Awaited<ReturnType<typeof processModel>>) => ({
 						return endpoints.tgi(args);
 					case "local":
 						return endpoints.local(args);
+					case "hfinference":


Suggested change

case "hfinference":

case "inference-providers":

…at-ui into feat/hfinference_endpoint

nsarrazin added 8 commits April 30, 2025 23:13

work on hf inference endpoint

8881f1e

working tools

332bf33

fix: debug tool

b4dc31e

feat: working tools with new endpoint type

4747309

feat: move config to HfInference

584ad18

Merge branch 'main' into feat/hfinference_endpoint

57d9eb9

wip

7e5beeb

fix: hf inference endpoint

7579ee4

nsarrazin added enhancement New feature or request models This issue is related to model performance/reliability labels May 6, 2025

Merge branch 'main' into feat/hfinference_endpoint

5a4785c

julien-c reviewed May 6, 2025

View reviewed changes

nsarrazin added 4 commits May 6, 2025 14:35

refactor: rename endpoint

87419d6

fix: rendering

1a02981

Merge branch 'feat/hfinference_endpoint' of github.com:huggingface/ch…

412036d

…at-ui into feat/hfinference_endpoint

Merge branch 'main' into feat/hfinference_endpoint

0f056b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New `InferenceClient` endpoint type #1813

New `InferenceClient` endpoint type #1813

	? new InferenceClient(config.HF_TOKEN).endpoint(baseURL)
	? new InferenceClient(config.HF_TOKEN, { endpointUrl: baseURL })

New InferenceClient endpoint type #1813

Are you sure you want to change the base?

New InferenceClient endpoint type #1813

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

New `InferenceClient` endpoint type #1813

New `InferenceClient` endpoint type #1813