A Python SDK leverages TiDB as an unified data store for developers to build GenAI applications.
- π Support various search modes: vector search, fulltext search, hybrid search
- π Automatic embedding generation
- π― Advanced filtering capabilities
- π± Transaction support
- π Model Context Protocol (MCP) support
Quick Start Guide: Jupyter Notebook
pip install pytidb
# If you want to use built-in embedding function and rerankers.
pip install "pytidb[models]"
# If you want to convert query result to pandas DataFrame.
pip install pandas
Go to tidbcloud.com to create a free TiDB cluster.
import os
from pytidb import TiDBClient
db = TiDBClient.connect(
host=os.getenv("TIDB_HOST"),
port=int(os.getenv("TIDB_PORT")),
username=os.getenv("TIDB_USERNAME"),
password=os.getenv("TIDB_PASSWORD"),
database=os.getenv("TIDB_DATABASE"),
)
PyTiDB automatically embeds the text field (e.g. text
) and saves the vector embedding to the vector field (e.g. text_vec
).
Create a table with embedding function:
from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction
text_embed = EmbeddingFunction("openai/text-embedding-3-small")
class Chunk(TableModel, table=True):
__tablename__ = "chunks"
id: int = Field(primary_key=True)
text: str = Field()
text_vec: list[float] = text_embed.VectorField(
source_field="text"
) # π Define the vector field.
user_id: int = Field()
table = db.create_table(schema=Chunk)
Bulk insert data:
table.bulk_insert(
[
Chunk(id=2, text="bar", user_id=2), # π The text field will be embedded to a
Chunk(id=3, text="baz", user_id=3), # vector and save to the text_vec field
Chunk(id=4, text="qux", user_id=4), # automatically.
]
)
Vector Search
Vector search help you find the most relevant records based on semantic similarity, so you don't need to explicitly include all the keywords in your query.
df = (
table.search("<query>") # π The query will be embedding automatically.
.filter({"user_id": 2})
.limit(2)
.to_pandas()
)
For a complete example, please go to the Vector Search demo.
Fulltext Search
Full-text search helps tokenize the query and find the most relevant records by matching exact keywords.
if not table.has_fts_index("text"):
table.create_fts_index("text") # π Create a fulltext index on the text column.
df = (
table.search("<query>", search_type="fulltext")
.limit(2)
.to_pandas()
)
For a complete example, please go to the Fulltext Search demo.
Hybrid Search
Hybrid search combines vector search and fulltext search to provide a more accurate and relevant search result.
from pytidb.rerankers import Reranker
jinaai = Reranker(model_name="jina_ai/jina-reranker-m0")
df = (
table.search("<query>", search_type="hybrid")
.rerank(jinaai, "text") # π Rerank the query result with the jinaai model.
.limit(2)
.to_pandas()
)
For a complete example, please go to the Hybrid Search demo.
PyTiDB supports various operators for flexible filtering:
Operator | Description | Example |
---|---|---|
$eq |
Equal to | {"field": {"$eq": "hello"}} |
$gt |
Greater than | {"field": {"$gt": 1}} |
$gte |
Greater than or equal | {"field": {"$gte": 1}} |
$lt |
Less than | {"field": {"$lt": 1}} |
$lte |
Less than or equal | {"field": {"$lte": 1}} |
$in |
In array | {"field": {"$in": [1, 2, 3]}} |
$nin |
Not in array | {"field": {"$nin": [1, 2, 3]}} |
$and |
Logical AND | {"$and": [{"field1": 1}, {"field2": 2}]} |
$or |
Logical OR | {"$or": [{"field1": 1}, {"field2": 2}]} |
from pytidb import Session
from pytidb.sql import select
# Create a table to store user data:
class User(TableModel, table=True):
__tablename__ = "users"
id: int = Field(primary_key=True)
name: str = Field(max_length=20)
with Session(engine) as session:
query = (
select(Chunk).join(User, Chunk.user_id == User.id).where(User.name == "Alice")
)
chunks = session.exec(query).all()
[(c.id, c.text, c.user_id) for c in chunks]
PyTiDB supports transaction management, so you can avoid race conditions and ensure data consistency.
with db.session() as session:
initial_total_balance = db.query("SELECT SUM(balance) FROM players").scalar()
# Transfer 10 coins from player 1 to player 2
db.execute("UPDATE players SET balance = balance - 10 WHERE id = 1")
db.execute("UPDATE players SET balance = balance + 10 WHERE id = 2")
session.commit()
# or session.rollback()
final_total_balance = db.query("SELECT SUM(balance) FROM players").scalar()
assert final_total_balance == initial_total_balance