Fix cpu inference with gpu build #323

zpitroda · 2025-04-30T21:35:49Z

When building and training with gpu and then trying to start service I get error:

local_llm_service.py:222 - Failed to start llama-server: free(): double free detected in tcache 2

I believe this is due to the env["CUDA_VISIBLE_DEVICES"] = "" line of local_llm_service causing memory conflicts since inference is currently only on cpu

# v1.0.0 - First Release 🎉

Co-authored-by: Ye Xiangle <yexiangle@mail.mindverse.ai>

kevin-mindverse and others added 3 commits April 29, 2025 11:35

Merge pull request mindverse#306 from mindverse/release_0428

ddfcd15

# v1.0.0 - First Release 🎉

fix:hotfix use_previous_params problem (mindverse#320)

245bb1e

Co-authored-by: Ye Xiangle <yexiangle@mail.mindverse.ai>

Remove env["CUDA_VISIBLE_DEVICES"] = ""

ec8492c

zpitroda mentioned this pull request May 1, 2025

Cuda toggle for inference #326

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cpu inference with gpu build #323

Fix cpu inference with gpu build #323

Fix cpu inference with gpu build #323

Are you sure you want to change the base?

Fix cpu inference with gpu build #323

Conversation