vllm-usecases

1. Chat application

Steps to run a chatbot using vLLM as backend

This application is run from a docker container

Prerequisites:

vLLM image ( can be built from here)
HuggingFace token pertaining to the model you want to run (only needed for gated models)

export HF token
```
export HF_TOKEN=<>
```

Run your container in detached mode , with the following variables

docker run -itd --entrypoint /bin/bash -v ~/.cache/huggingface:/root/.cache/huggingface --privileged=true --network host -e HF_TOKEN=$HF_TOKEN --cpus <> -m <>GB --name <vllm-image> cpu-test

Copy the script to run the client application in the container
```
docker cp streamlit-chat-app.py  cpu-test:vllm/examples/ 
```

Run your vLLM server with desired model and your chat client

docker exec cpu-test bash -c "
    /opt/conda/bin/python3 -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.1-8B-Instruct --dtype half --max-model-len 4000
    pip install streamlit
    streamlit run vllm/examples/streamlit-chat-app.py"

Access your chatbot on this link in your browser

http://<ip of host vm where container is running>:8501

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
streamlit-chat-app.py		streamlit-chat-app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-usecases

1. Chat application

Steps to run a chatbot using vLLM as backend

About

Releases

Packages

Languages

odh-on-pz/vllm-usecases

Folders and files

Latest commit

History

Repository files navigation

vllm-usecases

1. Chat application

Steps to run a chatbot using vLLM as backend

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages