8000 GitHub - ray-project/llm-applications at v0.0.8
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ray-project/llm-applications

Repository files navigation

LLM Applications

An end-to-end guide for scaling and serving LLM application in production. This repo currently contains one such application: a retrieval-augmented generation (RAG) app for answering questions about supplied information.

Setup

API keys

We'll be using OpenAI to access ChatGPT models like gpt-3.5-turbo, gpt-4, etc. and Anyscale Endpoints to access OSS LLMs like Llama-2-70b. Be sure to create your accounts for both and have your credentials ready.

Compute

Repository

git clone https://github.com/ray-project/llm-applications.git .  # git checkout -b goku origin/goku
git config --global user.name <GITHUB-USERNAME>
git config --global user.email <EMAIL-ADDRESS>

Data

Our data is already ready at /efs/shared_storage/goku/docs.ray.io/en/master/ (on Staging, us-east-1) but if you wanted to load it yourself, run this bash command (change /desired/output/directory, but make sure it's on the shared storage, so that it's accessible to the workers)

git clone https://github.com/ray-project/llm-applications.git .

Environment

Then set up the environment correctly by specifying the values in your .env file, and installing the dependencies:

pip install --user -r requirements.txt
export PYTHONPATH=$PYTHONPATH:$PWD
pre-commit install
pre-commit autoupdate

Variables

touch .env
# Add environment variables to .env
OPENAI_API_BASE="https://api.openai.com/v1"
OPENAI_API_KEY=""  # https://platform.openai.com/account/api-keys
ANYSCALE_API_BASE="https://api.endpoints.anyscale.com/v1"
ANYSCALE_API_KEY=""  # https://app.endpoints.anyscale.com/credentials
DB_CONNECTION_STRING="dbname=postgres user=postgres host=localhost password=postgres"
source .env

Steps

  1. Open rag.ipynb to interactively go through all the concepts and run experiments.
  2. Use the best configuration (in serve.py) from the notebook experiments to serve the LLM.
python app/main.py
  1. Query your service.
import json
import requests
data = {"query": "What is the default batch size for map_batches?"}
response = requests.post("http://127.0.0.1:8000/query", json=data)
print(response.text)
  1. Shutdown the service
from ray import serve
serve.shutdown()

About

A comprehensive guide to building RAG-based LLM applications for production.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

0