LLM Applications

An end-to-end guide for scaling and serving LLM application in production. This repo currently contains one such application: a retrieval-augmented generation (RAG) app for answering questions about supplied information.

Setup

API keys

We'll be using OpenAI to access ChatGPT models like gpt-3.5-turbo, gpt-4, etc. and Anyscale Endpoints to access OSS LLMs like Llama-2-70b. Be sure to create your accounts for both and have your credentials ready.

Compute

Start a new Anyscale workspace on staging using an g3.8xlarge head node (you can also add GPU worker nodes to run the workloads faster).
Use the default_cluster_env_2.6.2_py39 cluster environment.
Use the us-east-1 if you'd like to use the artifacts in our shared storage (source docs, vector DB dumps, etc.).

Repository

git clone https://github.com/ray-project/llm-applications.git .  # git checkout -b goku origin/goku
git config --global user.name <GITHUB-USERNAME>
git config --global user.email <EMAIL-ADDRESS>

Data

Our data is already ready at /efs/shared_storage/goku/docs.ray.io/en/master/ (on Staging, us-east-1) but if you wanted to load it yourself, run this bash command (change /desired/output/directory, but make sure it's on the shared storage, so that it's accessible to the workers)

git clone https://github.com/ray-project/llm-applications.git .

Environment

Then set up the environment correctly by specifying the values in your .env file, and installing the dependencies:

pip install --user -r requirements.txt
export PYTHONPATH=$PYTHONPATH:$PWD
pre-commit install
pre-commit autoupdate

Variables

touch .env
# Add environment variables to .env
OPENAI_API_BASE="https://api.openai.com/v1"
OPENAI_API_KEY=""  # https://platform.openai.com/account/api-keys
ANYSCALE_API_BASE="https://api.endpoints.anyscale.com/v1"
ANYSCALE_API_KEY=""  # https://app.endpoints.anyscale.com/credentials
DB_CONNECTION_STRING="dbname=postgres user=postgres host=localhost password=postgres"
source .env

Steps

Open rag.ipynb to interactively go through all the concepts and run experiments.
Use the best configuration (in serve.py) from the notebook experiments to serve the LLM.

python app/main.py

Query your service.

import json
import requests
data = {"query": "What is the default batch size for map_batches?"}
response = requests.post("http://127.0.0.1:8000/query", json=data)
print(response.text)

Shutdown the service

from ray import serve
serve.shutdown()

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
app		app
datasets		datasets
experiments		experiments
experiments_small		experiments_small
migrations		migrations
notebooks		notebooks
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup-pgvector.sh		setup-pgvector.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Applications

Setup

API keys

Compute

Repository

Data

Environment

Variables

Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

ray-project/llm-applications

Folders and files

Latest commit

History

Repository files navigation

LLM Applications

Setup

API keys

Compute

Repository

Data

Environment

Variables

Steps

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages