8000 GitHub - odh-on-pz/vllm-usecases
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

odh-on-pz/vllm-usecases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

vllm-usecases

1. Chat application

Steps to run a chatbot using vLLM as backend

This application is run from a docker container

Prerequisites:

  • vLLM image ( can be built from here)
  • HuggingFace token pertaining to the model you want to run (only needed for gated models)
  1. export HF token

    export HF_TOKEN=<>
  2. Run your container in detached mode , with the following variables

    docker run -itd --entrypoint /bin/bash -v ~/.cache/huggingface:/root/.cache/huggingface --privileged=true --network host -e HF_TOKEN=$HF_TOKEN --cpus <> -m <>GB --name <vllm-image> cpu-test 
  3. Copy the script to run the client application in the container

    docker cp streamlit-chat-app.py  cpu-test:vllm/examples/ 
  4. Run your vLLM server with desired model and your chat client

    docker exec cpu-test bash -c "
        /opt/conda/bin/python3 -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.1-8B-Instruct --dtype half --max-model-len 4000
        pip install streamlit
        streamlit run vllm/examples/streamlit-chat-app.py" 
  5. Access your chatbot on this link in your browser

    http://<ip of host vm where container is running>:8501 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0