A Python library that turns PostgreSQL into the retrieval engine behind robust, production-ready RAG and Agentic applications.
-
🔄 Automatically create vector embeddings from data in PostgreSQL tables as well as documents in S3. The embeddings are automatically updated as the data changes.
-
🔍 Powerful vector and semantic search with pgvector and pgvectorscale.
-
🛡️ Production-ready out-of-the-box: Supports batch processing for efficient embedding generation, with built-in handling for model failures, rate limits, and latency spikes.
Works with any PostgreSQL database, including Timescale Cloud, Amazon RDS, Supabase and more.
pip install pgai
This section will walk you through the steps to get started with pgai and Ollama using docker and show you the major features of pgai.
Please note that using Ollama requires a large (>4GB) download of the docker image and model. If you don't want to download so much data, you may want to use the OpenAI quick start or VoyageAI quick start instead.
Install PostgreSQL and Ollama if you don't already have them.
-
Download the docker compose file file.
curl -O https://raw.githubusercontent.com/timescale/pgai/main/examples/docker_compose_pgai_ollama/docker-compose.yml
-
Start the docker compose file.
docker compose up -d
This will start Ollama and a PostgreSQL instance.
-
Download the Ollama models. We'll use the
all-minilm
model for embeddings and thetinyllama
model for reasoning.docker compose exec ollama ollama pull all-minilm docker compose exec ollama ollama pull tinyllama
You can take a look at a simple fastAPI Application to see how to setup an app to perform RAG with pgai Vectorizer.
To run the app, first download the file and then you can use the following command:
fastapi dev with_psycopg.py
By going to http://0.0.0.0:8000/docs
you can see the API documentation and try out the endpoints provided by the app.
We'll walk you through the main parts of the code below.
-
Enable pgai vectorizer on your database During app startup we run the following Python to install the necessary database object in your PostgreSQL database. All the database objects are installed in the
ai
schema. Note that in production use cases, this can also be done via a database migration.pgai.install(DB_URL)
We also run the vectorizer worker as part of the FastAPI app lifecycle.
worker = Worker(DB_URL) task = asyncio.create_task(worker.run())
In this example, we run the Vectorizer worker inside the FastAPI app for simplicity. You can also run the vectorizer worker outside the FastAPI app, in a separate process or separate container. Please see the vectorizer worker documentation for more information.
-
Create the table and load the test dataset
We'll create a table named
wiki
from a few rows of the english-languagewikimedia/wikipedia
dataset.First, we'll create the table using the
create_wiki_table
function:Click to see the python code for `create_wiki_table`
async def create_wiki_table(): async with pool.connection() as conn: async with conn.cursor() as cur: await cur.execute(""" CREATE TABLE IF NOT EXISTS wiki ( id SERIAL PRIMARY KEY, url TEXT NOT NULL, title TEXT NOT NULL, text TEXT NOT NULL ) """) await conn.commit()
Then, we'll load the data from the huggingface dataset using the
load_wiki_articles
function:Click to see the python code for `load_wiki_articles`
async def load_wiki_articles(): # to keep the demo fast, we have some simple limits num_articles = 10 max_text_length = 1000 wiki_dataset = load_dataset("wikimedia/wikipedia", "20231101.en", split=f"train", streaming=True) async with pool.connection() as conn: async with conn.cursor() as cur: for article in wiki_dataset.take(num_articles): await cur.execute( "INSERT INTO wiki (url, title, text) VALUES (%s, %s, %s)", (article['url'], article['title'], article['text'][:max_text_length]) ) await conn.commit()
-
Create a vectorizer for
wiki
To enable semantic search on the
text
column of thewiki
table, we need to create vector embeddings for that column. We use a vectorizer to automatically create these embeddings and keep them in sync with the data in thewiki
table. We do this in thecreate_vectorizer
function:Click to see the python code for `create_vectorizer`
async def create_vectorizer(): vectorizer_statement = CreateVectorizer( source="wiki", target_table='wiki_embedding_storage', loading=LoadingColumnConfig(column_name='text'), embedding=EmbeddingOllamaConfig(model='all-minilm', dimensions=384, base_url="http://localhost:11434") ).to_sql() try: async with pool.connection() as conn: async with conn.cursor() as cur: await cur.execute(vectorizer_statement) await conn.commit() except Exception as e: if "already exists" in str(e): # ignore if the vectorizer already exists pass else: raise e
To learn more about vectorizers, see the vectorizer usage guide.
The
CreateVectorizer
call is a convenience sql statement builder that helps you build the sql statement to create the vectorizer. You can find the full reference for the sqlcreate_vectorizer
call in the vectorizer API reference. -
Check the progress of the vectorizer embedding creation
In order for the system to be able to perform batch processing when creating the embeddings, and to be able to recover form intermittent model failures, the vectorizer worker creates embeddings asynchronously in the background. To check the progress of the vectorizer embedding creation, we can query the
vectorizer_status
view. We do this in thevectorizer_status
function (and endpoint):Click to see the python code for `vectorizer_status`
@app.get("/vectorizer_status") async def vectorizer_status(): async with pool.connection() as conn: async with conn.cursor(row_factory=dict_row) as cur: await cur.execute("SELECT * FROM ai.vectorizer_status") return await cur.fetchall()
You can see the progress by going to the
/vectorizer_status
endpoint. All the embeddings have been created when thepending_items
column is 0. This should be very quick in this demo.Click to see the curl command to query the `/vectorizer_status` endpoint
curl -X 'GET' \ 'http://0.0.0.0:8000/vectorizer_status' \ -H 'accept: application/json'
-
Search the embeddings using pgvector
We'll search the embeddings for the concept of "properties of light" even though these words are not in the text of the articles. This is possible because vector embeddings capture the semantic meaning of the text.
Semantic search is a powerful feature in its own right, but it is also a key component of Retrieval Augmented Generation (RAG).
We first define a function called
_find_relevant_chunks
to find the relevant chunks for a given query. Since we are searching by semantic meaning, a large block of text needs to be broken down into smaller chunks so that the meaning of each chunk is coherent. The vectorizer does this automatically when creating the embeddings and when you search, you get back the chunks that are most relevant to the query.Click to see the python code for `_find_relevant_chunks`
@dataclass class WikiSearchResult: id: int url: str title: str text: str chunk: str distance: float async def _find_relevant_chunks(client: ollama.AsyncClient, query: str, limit: int = 2) -> List[WikiSearchResult]: response = await client.embed(model="all-minilm", input=query) embedding = np.array(response.embeddings[0]) async with pool.connection() as conn: async with conn.cursor(row_factory=class_row(WikiSearchResult)) as cur: await cur.execute(""" SELECT w.id, w.url, w.title, w.text, w.chunk, w.embedding <=> %s as distance FROM wiki_embedding w ORDER BY distance LIMIT %s """, (embedding, limit)) return await cur.fetchall()
This query selects from the
wiki_embedding
view that was created by the vectorizer and which contains all the columns from thewiki
table plus theembedding
column which contains the vector embeddings and thechunk
column which contains the chunked text corresponding to the embedding. In this query, we are returning both the full text of the article and the chunk that is most relevant to the query (different applications may want to return only one or the other).The
embedding <=> %s
is a PostgreSQL operator that computes the cosine distance between the stored embedding and the query embedding. the greater the distance, the more dissimilar the two vectors are and so we order the results by distance to get the most similar chunks.This is used in the
/search
endpoint.Click to see the python code for `/search`
@app.get("/search") async def search(query: str): client = ollama.AsyncClient(host="http://localhost:11434") results = await _find_relevant_chunks(client, query) return [asdict(result) for result in results]
Now you can search through these articles with a query to the search endpoint.
Click to see the curl command to query the `/search` endpoint
curl -X 'GET' \ 'http://0.0.0.0:8000/search?query=Properties%20of%20Light' \ -H 'accept: application/json'
-
Modify your data and have the vectorizer automatically update the embeddings
We'll create an endpoint called
insert_pgai_article
to add a row about pgai to thewiki
table and have the vectorizer automatically update the embeddings. This simulates changes to the underlying data.Click to see the python code for `insert_pgai_article`
@app.post("/insert_pgai_article") async def insert_pgai_article(): async with pool.connection() as conn: async with conn.cursor() as cur: await cur.execute(""" INSERT INTO wiki (url, title, text) VALUES (%s, %s, %s) """, ( "https://en.wikipedia.org/wiki/Pgai", "pgai - Power your AI applications with PostgreSQL", "pgai is a tool to make developing RAG and other AI applications easier..." )) await conn.commit() return {"message": "Article inserted successfully"}
Note: This endpoint simply inserts the text data into the wiki table without having to do anything to create the embeddings. The vectorizer will automatically create the embeddings for the new row without any intervention from you. The vectorizer will also automatically update the embeddings for the new row when the data changes.
After a few seconds, you can run a search query related to the new entry and see it returned as part of the results:
Click to see the curl command to query the `/search` endpoint
curl -X 'GET' \ 'http://0.0.0.0:8000/search?query=AI%20Tools' \ -H 'accept: application/json'
-
Perform Retrieval Augmented Generation (RAG)
In this section, we'll have the LLM answer questions about pgai based on the wiki entry we added by using RAG. The LLM was never trained on the pgai wiki entry, and so it needs data in the database to answer questions about pgai.
The
rag
endpoint looks as follows:Click to see the python code for `/rag`
@app.get("/rag") async def rag(query: str) -> Optional[str]: # Initialize Ollama client client = ollama.AsyncClient(host="http://localhost:11434") #find and format the chunks chunks = await _find_relevant_chunks(client, query) context = "\n\n".join(f"{article.title}:\n{article.text}" for article, _ in chunks) logger.debug(f"Context: {context}") # Construct prompt with context prompt = f"""Question: {query} Please use the following context to provide an accurate response: {context} Answer:""" # Generate response using Ollama SDK response = await client.generate( model='tinyllama', prompt=prompt, stream=False ) return response['response']
You can see the RAG response by querying the
/rag
endpoint.Click to see the curl command to query the `/rag` endpoint
curl -X 'GET' \ 'http://0.0.0.0:8000/rag?query=What%20is%20pgai' \ -H 'accept: application/json'
Our pgai Python library lets you work with embeddings generated from your data:
- Automatically create and sync vector embeddings for your data using the (learn more)
- Search your data using vector and semantic search (learn more)
- Implement Retrieval Augmented Generation as shown above in the Quick Start
- Perform high-performance, cost-efficient ANN search on large vector workloads with pgvectorscale, which complements pgvector.
We also offer a PostgreSQL extension that can perform LLM model calling directly from SQL. This is often useful for use cases like classification, summarization, and data enrichment on your existing data.
The vectorizer is designed to be flexible and customizable. Each vectorizer defines a pipeline for creating embeddings from your data. The pipeline is defined by a series of components that are applied in sequence to the data:
- Loading: First, you define the source of the data to embed. It can be the data stored directly in a column of the source table or a URI referenced in a column of the source table that points to a file, s3 bucket, etc.
- Parsing: Then, you define the way the data is parsed if it is a non-text document such as a PDF, HTML, or markdown file.
- Chunking: Next, you define the way text data is split into chunks.
- Formatting: Then, for each chunk, you define the way the data is formatted before it is sent for embedding. For example, you can add the title of the document as the first line of the chunk.
- Embedding: Finally, you specify the LLM provider, model, and the parameters to be used when generating the embeddings.
The following models are supported for embedding:
When you define a vectorizer, you define how an embedding is generated from you data in a declarative way (much like an index). That allows the system to manage the process of generating and updating the embeddings in the background for you. The declarative nature of the vectorizer is the "magic sauce" that allows the system to handle intermittent failures of the LLM and make the system robust and scalable.
The approach is similar to the way that indexes work in PostgreSQL. When you create an index, you are essentially declaring that you want to be able to search for data in a certain way. The system then manages the process of updating the index as the data changes.
- Vector Databases Are the Wrong Abstraction
- pgai: Giving PostgreSQL Developers AI Engineering Superpowers
- How to Automatically Create & Update Embeddings in PostgreSQL—With One SQL Query
- [video] Auto Create and Sync Vector Embeddings in 1 Line of SQL
- Which OpenAI Embedding Model Is Best for Your RAG App With Pgvector?
- Which RAG Chunking and Formatting Strategy Is Best for Your App With Pgvector
- Parsing All the Data With Open-Source Tools: Unstructured and Pgai
We welcome contributions to pgai! See the Contributing page for more information.
pgai is still at an early stage. Now is a great time to help shape the direction of this project; we are currently deciding priorities. Have a look at the list of features we're thinking of working on. Feel free to comment, expand the list, or hop on the Discussions forum.
To get started, take a look at how to contribute and how to set up a dev/test environment.
Timescale is a PostgreSQL database company. To learn more visit the timescale.com.
Timescale Cloud is a high-performance, developer focused, cloud platform that provides PostgreSQL services for the most demanding AI, time-series, analytics, and event workloads. Timescale Cloud is ideal for production applications and provides high availability, streaming backups, upgrades over time, roles and permissions, and great security.