The @upstash/rag-chat
package makes it easy to develop powerful retrieval-augmented generation (RAG) chat applications with minimal setup and configuration.
Features:
- Next.js compatibility with streaming support
- Ingest entire websites, PDFs and more out of the box
- Built-in Vector store for your knowledge base
- (Optional) built-in Redis compatibility for fast chat history management
- (Optional) built-in rate limiting
- (Optional) disableRag option to use it as LLM + chat history
Install the package using your preferred package manager:
pnpm add @upstash/rag-chat
bun add @upstash/rag-chat
npm i @upstash/rag-chat
- Set up your environment variables:
UPSTASH_VECTOR_REST_URL="XXXXX"
UPSTASH_VECTOR_REST_TOKEN="XXXXX"
# if you use OpenAI compatible models
OPENAI_API_KEY="XXXXX"
# or if you use Upstash hosted models
QSTASH_TOKEN="XXXXX"
# Optional: For Redis-based chat history (default is in-memory)
UPSTASH_REDIS_REST_URL="XXXXX"
UPSTASH_REDIS_REST_TOKEN="XXXXX"
- Initialize and use RAGChat:
import { RAGChat } from "@upstash/rag-chat";
const ragChat = new RAGChat();
const response = await ragChat.chat("Tell me about machine learning");
console.log(response);
RAGChat supports both Upstash-hosted models and all OpenAI and OpenAI-compatible models out of the box:
To use an OpenAI model, first initialize RAGChat:
import { RAGChat, openai } from "@upstash/rag-chat";
export const ragChat = new RAGChat({
model: openai("gpt-4-turbo"),
});
And set your OpenAI API key as an environment variable:
OPENAI_API_KEY=...
To use an Upstash model, first initialize RAGChat:
import { RAGChat, upstash } from "@upstash/rag-chat";
export const ragChat = new RAGChat({
model: upstash("mistralai/Mistral-7B-Instruct-v0.2"),
});
And set your Upstash QStash API key environment variable:
QSTASH_TOKEN=...
Initialize RAGChat with custom provider's API key and url:
import { RAGChat, custom } from "@upstash/rag-chat";
export const ragChat = new RAGChat({
model: custom("codellama/CodeLlama-70b-Instruct-hf", {
apiKey: "TOGETHER_AI_API_KEY",
baseUrl: "https://api.together.xyz/v1",
}),
});
Where do I find my Upstash API key?
- Navigate to your Upstash QStash Console.
- Scroll down to the Environment Keys section and copy the
QSTASH_TOKEN
to your.env
file.
RAGChat provides a powerful debugging feature that allows you to see the inner workings of your RAG applications. By enabling debug mode, you can trace the entire process from user input to final response.
To activate the debugging feature, simply initialize RAGChat with the debug
option set to true
:
new RAGChat({ debug: true });
When debug mode is enabled, RAGChat will log detailed information about each step of the RAG process. Here's a breakdown of the debug output:
-
SEND_PROMPT: Logs the initial user query.
{ "timestamp": 1722950191207, "logLevel": "INFO", "eventType": "SEND_PROMPT", "details": { "prompt": "Where is the capital of Japan?" } }
-
RETRIEVE_CONTEXT: Shows the relevant context retrieved from the vector store.
{ "timestamp": 1722950191480, "logLevel": "INFO", "eventType": "RETRIEVE_CONTEXT", "details": { "context": [ { "data": "Tokyo is the Capital of Japan.", "id": "F5BWpryYkkcKLrp-GznwK" } ] }, "latency": "171ms" }
-
RETRIEVE_HISTORY: Displays the chat history retrieved for context.
{ "timestamp": 1722950191727, "logLevel": "INFO", "eventType": "RETRIEVE_HISTORY", "details": { "history": [ { "content": "Where is the capital of Japan?", "role": "user", "id": "0" } ] }, "latency": "145ms" }
-
FORMAT_HISTORY: Shows how the chat history is formatted for the prompt.
{ "timestamp": 1722950191828, "logLevel": "INFO", "eventType": "FORMAT_HISTORY", "details": { "formattedHistory": "USER MESSAGE: Where is the capital of Japan?" } }
-
FINAL_PROMPT: Displays the complete prompt sent to the language model.
{ "timestamp": 1722950191931, "logLevel": "INFO", "eventType": "FINAL_PROMPT", "details": { "prompt": "You are a friendly AI assistant augmented with an Upstash Vector Store.\n To help you answer the questions, a context and/or chat history will be provided.\n Answer the question at the end using only the information available in the context or chat history, either one is ok.\n\n -------------\n Chat history:\n USER MESSAGE: Where is the capital of Japan?\n -------------\n Context:\n - Tokyo is the Capital of Japan.\n -------------\n\n Question: Where is the capital of Japan?\n Helpful answer:" } }
-
LLM_RESPONSE: Shows the final response from the language model.
{ "timestamp": 1722950192593, "logLevel": "INFO", "eventType": "LLM_RESPONSE", "details": { "response": "According to the context, Tokyo is the capital of Japan!" }, "latency": "558ms" }
Customize your RAGChat instance with advanced options:
import { RAGChat, openai } from "@upstash/rag-chat";
// π Optional: For built-in rate limiting
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
export const ragChat = new RAGChat({
model: openai("gpt-4-turbo"),
promptFn: ({ context, question, chatHistory }) =>
`You are an AI assistant with access to an Upstash Vector Store.
Use the provided context and chat history to answer the question.
If the answer isn't available, politely inform the user.
------
Chat history:
${chatHistory}
------
Context:
${context}
------
Question: ${question}
Answer:`,
ratelimit: new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, "10 s"),
}),
});
Add various types of data to your RAG application:
await ragChat.context.add({
type: "text",
data: "The speed of light is approximately 299,792,458 meters per second.",
});
//OR
await ragChat.context.add("The speed of light is approximately 299,792,458 meters per second.");
await ragChat.context.add({
type: "pdf",
fileSource: "./data/quantum_computing_basics.pdf",
// optional π: only add this knowledge to a specific namespace
options: { namespace: "user-123-documents" },
});
await ragChat.context.add({
type: "html",
source: "https://en.wikipedia.org/wiki/Quantum_computing",
// optional π: custom page parsing settings
config: { chunkOverlap: 50, chunkSize: 200 },
});
Remove specific documents:
await ragChat.context.delete({ id: "1", namespace: "user-123-documents" });
RAGChat provides robust functionality for interacting with and managing chat history. This allows you to maintain context, review past interactions, and customize the conversation flow.
Fetch recent messages from the chat history:
const history = await ragChat.history.getMessages({ amount: 10 });
console.log(history); // π Last (up to) 10 messages
You can also specify a session ID to retrieve history for a particular conversation:
const sessionHistory = await ragChat.history.getMessages({
amount: 5,
sessionId: "user-123-session",
});
Remove chat history for a specific session:
ragChat.history.deleteMessages({ sessionId: "user-123-session" });
Injecting custom messages into the chat history:
// Adding a user message
await ragChat.history.addMessage({
message: { content: "What's the weather like?", role: "user" },
});
// Adding a system message
await ragChat.history.addMessage({
message: {
content: "The AI should provide weather information.",
role: "system",
},
});
Helicone is a powerful observability platform that provides valuable insights into your LLM usage. Integrating Helicone with RAGChat is straightforward.
To enable Helicone observability in RAGChat, you simply need to pass your Helicone API key when initializing your model. Here's how to do it for both custom models and OpenAI:
import { RAGChat, upstash } from "ragchat";
const ragChat = new RAGChat({
model: upstash("meta-llama/Meta-Llama-3-8B-Instruct", {
apiKey: process.env.QSTASH_TOKEN,
analytics: { name: "helicone", token: process.env.HELICONE_API_KEY },
}),
});
import { RAGChat, custom } from "ragchat";
const ragChat = new RAGChat({
model: custom("meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", {
apiKey: "xxx",
baseUrl: "https://api.together.xyz",
analytics: { name: "helicone", token: process.env.HELICONE_API_KEY! },
}),
});
import { RAGChat, openai } from "ragchat";
const ragChat = new RAGChat({
model: openai("gpt-3.5-turbo", {
apiKey: process.env.OPENAI_API_KEY!,
analytics: { name: "helicone", token: process.env.HELICONE_API_KEY },
}),
});
RAGChat integrates with Next.js route handlers out of the box. Here's how to use it:
import { ragChat } from "@/utils/rag-chat";
import { NextResponse } from "next/server";
export const POST = async (req: Request) => {
// π user message
const { message } = await req.json();
const { output } = await ragChat.chat(message);
return NextResponse.json({ output });
};
To stream the response from a route handler:
import { ragChat } from "@/utils/rag-chat";
export const POST = async (req: Request) => {
const { message } = await req.json();
const { output } = await ragChat.chat(message, { streaming: true });
return new Response(output);
};
On the frontend, you can read the streamed data like this:
"use client"
export const ChatComponent = () => {
const [response, setResponse] = useState('');
async function fetchStream() {
const response = await fetch("/api/chat", {
method: "POST",
body: JSON.stringify({ message: "Your question here" }),
});
if (!response.body) {
console.error("No response body");
return;
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
setResponse(prev => prev + chunk);
}
}
useEffect(() => {
fetchStream();
}, []);
return <div>{response}</div>;
}
RAGChat supports Next.js server actions natively. First, define your server action:
"use server";
import { ragChat } from "@/utils/rag-chat";
import { createServerActionStream } from "@upstash/rag-chat/nextjs";
export const serverChat = async (message: string) => {
const { output } = await ragChat.chat(message, { streaming: true });
// π adapter to let us stream from server actions
return createServerActionStream(output);
};
Second, use the server action in your client component:
"use client";
import { readServerActionStream } from "@upstash/rag-chat/nextjs";
export const ChatComponent = () => {
const [response, setResponse] = useState('');
const clientChat = async () => {
const stream = await serverChat("How are you?");
for await (const chunk of readServerActionStream(stream)) {
setResponse(prev => prev + chunk);
}
};
return (
<div>
<button onClick={clientChat}>Start Chat</button>
<div>{response}</div>
</div>
);
};
RAGChat can be easily integrated with the Vercel AI SDK. First, set up your route handler:
import { aiUseChatAdapter } from "@upstash/rag-chat/nextjs";
import { ragChat } from "@/utils/rag-chat";
export async function POST(req: Request) {
const { messages } = await req.json();
const lastMessage = messages[messages.length - 1].content;
const response = await ragChat.chat(lastMessage, { streaming: true });
return aiUseChatAdapter(response);
}
Second, use the useChat
hook in your frontend component:
"use client"
import { useChat } from "ai/react";
const ChatComponent = () => {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: "/api/chat",
initialInput: "What year was the construction of the Eiffel Tower completed, and what is its height?",
});
return (
<div>
<ul>
{messages.map((m) => (
<li key={m.id}>{m.content}</li>
))}
</ul>
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={handleInputChange}
placeholder="Ask a question..."
/>
<button type="submit">Send</button>
</form>
</div>
);
};