Quick Start

Power your RAG and Agentic applications with PostgreSQL

Docs · Join the pgai Discord! · Try timescale for free! · Changelog

A Python library that turns PostgreSQL into the retrieval engine behind robust, production-ready RAG and Agentic applications.

🔄 Automatically create vector embeddings from data in PostgreSQL tables as well as documents in S3. The embeddings are automatically updated as the data changes.
🔍 Powerful vector and semantic search with pgvector and pgvectorscale.
🛡️ Production-ready out-of-the-box: Supports batch processing for efficient embedding generation, with built-in handling for model failures, rate limits, and latency spikes.

Works with any PostgreSQL database, including Timescale Cloud, Amazon RDS, Supabase and more.

install via pip

pip install pgai

Quick Start

This section will walk you through the steps to get started with pgai and Ollama using docker and show you the major features of pgai.

Please note that using Ollama requires a large (>4GB) download of the docker image and model. If you don't want to download so much data, you may want to use the OpenAI quick start or VoyageAI quick start instead.

Setup

< 8000 /a>

Install PostgreSQL and Ollama if you don't already have them.

Download the docker compose file file.

curl -O https://raw.githubusercontent.com/timescale/pgai/main/examples/docker_compose_pgai_ollama/docker-compose.yml

Start the docker compose file.
```
docker compose up -d
```
This will start Ollama and a PostgreSQL instance.
Download the Ollama models. We'll use the all-minilm model for embeddings and the tinyllama model for reasoning.
```
docker compose exec ollama ollama pull all-minilm
docker compose exec ollama ollama pull tinyllama
```

Create a FastAPI app that performs semantic search and RAG over wikipedia articles

You can take a look at a simple fastAPI Application to see how to setup an app to perform RAG with pgai Vectorizer.

To run the app, first download the file and then you can use the following command:

fastapi dev with_psycopg.py

By going to http://0.0.0.0:8000/docs you can see the API documentation and try out the endpoints provided by the app.

We'll walk you through the main parts of the code below.

Enable pgai vectorizer on your database During app startup we run the following Python to install the necessary database object in your PostgreSQL database. All the database objects are installed in the ai schema. Note that in production use cases, this can also be done via a database migration.
```
pgai.install(DB_URL)
```
We also run the vectorizer worker as part of the FastAPI app lifecycle.
```
worker = Worker(DB_URL)
task = asyncio.create_task(worker.run())
```
In this example, we run the Vectorizer worker inside the FastAPI app for simplicity. You can also run the vectorizer worker outside the FastAPI app, in a separate process or separate container. Please see the vectorizer worker documentation for more information.

Create the table and load the test dataset

We'll create a table named wiki from a few rows of the english-language wikimedia/wikipedia dataset.

First, we'll create the table using the create_wiki_table function:

Click to see the python code for `create_wiki_table`

async def create_wiki_table():
    async with pool.connection() as conn:
        async with conn.cursor() as cur:
            await cur.execute("""
                CREATE TABLE IF NOT EXISTS wiki (
                    id SERIAL PRIMARY KEY,
                    url TEXT NOT NULL,
                    title TEXT NOT NULL,
                    text TEXT NOT NULL
                )
            """)
        await conn.commit()

Then, we'll load the data from the huggingface dataset using the load_wiki_articles function:

Click to see the python code for `load_wiki_articles`

async def load_wiki_articles():
    # to keep the demo fast, we have some simple limits
    num_articles = 10
    max_text_length = 1000
    
    wiki_dataset = load_dataset("wikimedia/wikipedia", "20231101.en", split=f"train", streaming=True)
    
    async with pool.connection() as conn:
        async with conn.cursor() as cur:
            for article in wiki_dataset.take(num_articles):
                await cur.execute(
                    "INSERT INTO wiki (url, title, text) VALUES (%s, %s, %s)",
                    (article['url'], article['title'], article['text'][:max_text_length])
                )
            await conn.commit()

Create a vectorizer for wiki

To enable semantic search on the text column of the wiki table, we need to create vector embeddings for that column. We use a vectorizer to automatically create these embeddings and keep them in sync with the data in the wiki table. We do this in the create_vectorizer function:

Click to see the python code for `create_vectorizer`

async def create_vectorizer():
    vectorizer_statement = CreateVectorizer(
        source="wiki",
        target_table='wiki_embedding_storage',
        loading=LoadingColumnConfig(column_name='text'),
        embedding=EmbeddingOllamaConfig(model='all-minilm', dimensions=384, base_url="http://localhost:11434")
    ).to_sql()
    
    try:
        async with pool.connection() as conn:
            async with conn.cursor() as cur:
                await cur.execute(vectorizer_statement)
            await conn.commit()
    except Exception as e:
        if "already exists" in str(e):
            # ignore if the vectorizer already exists
            pass
        else:
            raise e

To learn more about vectorizers, see the vectorizer usage guide.

The CreateVectorizer call is a convenience sql statement builder that helps you build the sql statement to create the vectorizer. You can find the full reference for the sql create_vectorizer call in the vectorizer API reference.

Check the progress of the vectorizer embedding creation

In order for the system to be able to perform batch processing when creating the embeddings, and to be able to recover form intermittent model failures, the vectorizer worker creates embeddings asynchronously in the background. To check the progress of the vectorizer embedding creation, we can query the vectorizer_status view. We do this in the vectorizer_status function (and endpoint):
Click to see the python code for `vectorizer_status`
```
@app.get("/vectorizer_status")
async def vectorizer_status():
    async with pool.connection() as conn:
        async with conn.cursor(row_factory=dict_row) as cur:
            await cur.execute("SELECT * FROM ai.vectorizer_status")
            return await cur.fetchall()
```
You can see the progress by going to the /vectorizer_status endpoint. All the embeddings have been created when the pending_items column is 0. This should be very quick in this demo.
Click to see the curl command to query the `/vectorizer_status` endpoint
```
curl -X 'GET' \
    'http://0.0.0.0:8000/vectorizer_status' \
    -H 'accept: application/json'
```
Search the embeddings using pgvector

We'll search the embeddings for the concept of "properties of light" even though these words are not in the text of the articles. This is possible because vector embeddings capture the semantic meaning of the text.

Semantic search is a powerful feature in its own right, but it is also a key component of Retrieval Augmented Generation (RAG).

We first define a function called _find_relevant_chunks to find the relevant chunks for a given query. Since we are searching by semantic meaning, a large block of text needs to be broken down into smaller chunks so that the meaning of each chunk is coherent. The vectorizer does this automatically when creating the embeddings and when you search, you get back the chunks that are most relevant to the query.
Click to see the python code for `_find_relevant_chunks`
```
@dataclass
class WikiSearchResult:
    id: int
    url: str
    title: str
    text: str
    chunk: str
    distance: float

async def _find_relevant_chunks(client: ollama.AsyncClient, query: str, limit: int = 2) -> List[WikiSearchResult]:
    response = await client.embed(model="all-minilm", input=query)
    embedding = np.array(response.embeddings[0])
    
    async with pool.connection() as conn:
        async with conn.cursor(row_factory=class_row(WikiSearchResult)) as cur:
            await cur.execute("""
                SELECT w.id, w.url, w.title, w.text, w.chunk, w.embedding <=> %s as distance
                FROM wiki_embedding w
                ORDER BY distance
                LIMIT %s
            """, (embedding, limit))
            
            return await cur.fetchall()
```
This query selects from the wiki_embedding view that was created by the vectorizer and which contains all the columns from the wiki table plus the embedding column which contains the vector embeddings and the chunk column which contains the chunked text corresponding to the embedding. In this query, we are returning both the full text of the article and the chunk that is most relevant to the query (different applications may want to return only one or the other).

The embedding <=> %s is a PostgreSQL operator that computes the cosine distance between the stored embedding and the query embedding. the greater the distance, the more dissimilar the two vectors are and so we order the results by distance to get the most similar chunks.

This is used in the /search endpoint.
Click to see the python code for `/search`
```
@app.get("/search")
async def search(query: str):
    client = ollama.AsyncClient(host="http://localhost:11434")
    results = await _find_relevant_chunks(client, query)
    return [asdict(result) for result in results]  
```
Now you can search through these articles with a query to the search endpoint.
Click to see the curl command to query the `/search` endpoint
```
curl -X 'GET' \
    'http://0.0.0.0:8000/search?query=Properties%20of%20Light' \
    -H 'accept: application/json'
```

Modify your data and have the vectorizer automatically update the embeddings

We'll create an endpoint called insert_pgai_article to add a row about pgai to the wiki table and have the vectorizer automatically update the embeddings. This simulates changes to the underlying data.

Click to see the python code for `insert_pgai_article`

@app.post("/insert_pgai_article")
async def insert_pgai_article():
    async with pool.connection() as conn:
        async with conn.cursor() as cur:
            await cur.execute("""
                INSERT INTO wiki (url, title, text)
                VALUES (%s, %s, %s)
            """, (
                "https://en.wikipedia.org/wiki/Pgai",
                "pgai - Power your AI applications with PostgreSQL",
                "pgai is a tool to make developing RAG and other AI applications easier..."
            ))
            await conn.commit()
    return {"message": "Article inserted successfully"}

Note: This endpoint simply inserts the text data into the wiki table without having to do anything to create the embeddings. The vectorizer will automatically create the embeddings for the new row without any intervention from you. The vectorizer will also automatically update the embeddings for the new row when the data changes.

After a few seconds, you can run a search query related to the new entry and see it returned as part of the results:

Click to see the curl command to query the `/search` endpoint

curl -X 'GET' \
    'http://0.0.0.0:8000/search?query=AI%20Tools' \
    -H 'accept: application/json'

Perform Retrieval Augmented Generation (RAG)

In this section, we'll have the LLM answer questions about pgai based on the wiki entry we added by using RAG. The LLM was never trained on the pgai wiki entry, and so it needs data in the database to answer questions about pgai.

The rag endpoint looks as follows:

Click to see the python code for `/rag`

@app.get("/rag")
async def rag(query: str) -> Optional[str]:
    # Initialize Ollama client
    client = ollama.AsyncClient(host="http://localhost:11434")
    
    #find and format the chunks
    chunks = await _find_relevant_chunks(client, query)
    context = "\n\n".join(f"{article.title}:\n{article.text}" for article, _ in chunks)
    logger.debug(f"Context: {context}")
    
    # Construct prompt with context
    prompt = f"""Question: {query}

Please use the following context to provide an accurate response:

{context}

Answer:"""
            
    # Generate response using Ollama SDK
    response = await client.generate(
        model='tinyllama',
        prompt=prompt,
        stream=False
    )
    
    return response['response']

You can see the RAG response by querying the /rag endpoint.

Click to see the curl command to query the `/rag` endpoint

curl -X 'GET' \
    'http://0.0.0.0:8000/rag?query=What%20is%20pgai' \
    -H 'accept: application/json'

Features

Our pgai Python library lets you work with embeddings generated from your data:

Automatically create and sync vector embeddings for your data using the (learn more)
Search your data using vector and semantic search (learn more)
Implement Retrieval Augmented Generation as shown above in the Quick Start
Perform high-performance, cost-efficient ANN search on large vector workloads with pgvectorscale, which complements pgvector.

We also offer a PostgreSQL extension that can perform LLM model calling directly from SQL. This is often useful for use cases like classification, summarization, and data enrichment on your existing data.

A configurable vectorizer pipeline

The vectorizer is designed to be flexible and customizable. Each vectorizer defines a pipeline for creating embeddings from your data. The pipeline is defined by a series of components that are applied in sequence to the data:

Loading: First, you define the source of the data to embed. It can be the data stored directly in a column of the source table or a URI referenced in a column of the source table that points to a file, s3 bucket, etc.
Parsing: Then, you define the way the data is parsed if it is a non-text document such as a PDF, HTML, or markdown file.
Chunking: Next, you define the way text data is split into chunks.
Formatting: Then, for each chunk, you define the way the data is formatted before it is sent for embedding. For example, you can add the title of the document as the first line of the chunk.
Embedding: Finally, you specify the LLM provider, model, and the parameters to be used when generating the embeddings.

Supported embedding models

The following models are supported for embedding:

The importance of a declarative approach to embedding generation

When you define a vectorizer, you define how an embedding is generated from you data in a declarative way (much like an index). That allows the system to manage the process of generating and updating the embeddings in the background for you. The declarative nature of the vectorizer is the "magic sauce" that allows the system to handle intermittent failures of the LLM and make the system robust and scalable.

The approach is similar to the way that indexes work in PostgreSQL. When you create an index, you are essentially declaring that you want to be able to search for data in a certain way. The system then manages the process of updating the index as the data changes.

Resources

Why we built it

Quick start guides

Tutorials about pgai vectorizer

Contributing

We welcome contributions to pgai! See the Contributing page for more information.

Get involved

pgai is still at an early stage. Now is a great time to help shape the direction of this project; we are currently deciding priorities. Have a look at the list of features we're thinking of working on. Feel free to comment, expand the list, or hop on the Discussions forum.

To get started, take a look at how to contribute and how to set up a dev/test environment.

About Timescale

Timescale is a PostgreSQL database company. To learn more visit the timescale.com.

Timescale Cloud is a high-performance, developer focused, cloud platform that provides PostgreSQL services for the most demanding AI, time-series, analytics, and event workloads. Timescale Cloud is ideal for production applications and provides high availability, streaming backups, upgrades over time, roles and permissions, and great security.

Name		Name	Last commit message	Last commit date
Latest commit History 774 Commits
.github		.github
docs		docs
examples		examples
projects		projects
scripts		scripts
.gitignore		.gitignore
.release-please-manifest.json		.release-please-manifest.json
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
commitlint.config.mjs		commitlint.config.mjs
justfile		justfile
llms.txt		llms.txt
release-please-config.json		release-please-config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Power your RAG and Agentic applications with PostgreSQL

install via pip

Quick Start

Setup

Create a FastAPI app that performs semantic search and RAG over wikipedia articles

Features

A configurable vectorizer pipeline

Supported embedding models

The importance of a declarative approach to embedding generation

Resources

Why we built it

Quick start guides

Tutorials about pgai vectorizer

Contributing

Get involved

About Timescale

About

Uh oh!

Releases

Packages

Languages

License

nexttech3/pgai

Folders and files

Latest commit

History

Repository files navigation

Power your RAG and Agentic applications with PostgreSQL

install via pip

Quick Start

Setup

Create a FastAPI app that performs semantic search and RAG over wikipedia articles

Features

A configurable vectorizer pipeline

Supported embedding models

The importance of a declarative approach to embedding generation

Resources

Why we built it

Quick start guides

Tutorials about pgai vectorizer

Contributing

Get involved

About Timescale

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages