vLLM has no problem using multiprocessing
as the backend to host any phi-3-mini
model. However, as soon as Ray is used, it blows up. In the Ray Dashboard you will notice that _AsyncLLMEngine
is dead. If you look at the "System" logs tab and scroll down a bit, you will see:
ray.exceptions.RaySystemError: System error: No module named 'transformers_modules'
- Download
Phi-3-mini-128k-instruct
from HuggingFace - Setup the conda environment:
conda env create -f environment.yml
NOTE: This problem was experienced with Python 3.11.9, so that's what the conda env uses.
- Install Ray and vLLM:
pip install -r requirements.txt
- Activate the environment:
conda activate vllm_phi_3_ray_problem_env
Important: Make sure you have activated your conda env!
python ./run_with_mp.py
python ./run_with_ray.py
- The ray script contains a print statement telling you whether or not the
transformers_modules
folder is present in your home dir cache. For me, as I don't haveHF_HOME
orHF_MOFULES_HOME
set, it is. - If I delete the HF modules dir in teh cache, it gets re-created when I run the Ray file, and I see a
Phi-3-mini-128k-instruct
folder in it.