8000 GitHub - pingcap/pytidb: TiDB client library for Python developers
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

pingcap/pytidb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

73 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TiDB Python SDK

Python Package Index Downloads

A Python SDK leverages TiDB as an unified data store for developers to build GenAI applications.

  • πŸ” Support various search modes: vector search, fulltext search, hybrid search
  • πŸ”„ Automatic embedding generation
  • 🎯 Advanced filtering capabilities
  • πŸ’± Transaction support
  • πŸ”Œ Model Context Protocol (MCP) support

Quick Start Guide: Jupyter Notebook

Installation

pip install pytidb

# If you want to use built-in embedding function and rerankers.
pip install "pytidb[models]"

# If you want to convert query result to pandas DataFrame.
pip install pandas

Connect to TiDB Cloud

Go to tidbcloud.com to create a free TiDB cluster.

import os
from pytidb import TiDBClient

db = TiDBClient.connect(
    host=os.getenv("TIDB_HOST"),
    port=int(os.getenv("TIDB_PORT")),
    username=os.getenv("TIDB_USERNAME"),
    password=os.getenv("TIDB_PASSWORD"),
    database=os.getenv("TIDB_DATABASE"),
)

Highlights

πŸ€– Auto Embedding

PyTiDB automatically embeds the text field (e.g. text) and saves the vector embedding to the vector field (e.g. text_vec).

Create a table with embedding function:

from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction

text_embed = EmbeddingFunction("openai/text-embedding-3-small")

class Chunk(TableModel, table=True):
    __tablename__ = "chunks"

    id: int = Field(primary_key=True)
    text: str = Field()
    text_vec: list[float] = text_embed.VectorField(
        source_field="text"
    )  # πŸ‘ˆ Define the vector field.
    user_id: int = Field()

table = db.create_table(schema=Chunk)

Bulk insert data:

table.bulk_insert(
    [
        Chunk(id=2, text="bar", user_id=2),   # πŸ‘ˆ The text field will be embedded to a 
        Chunk(id=3, text="baz", user_id=3),   # vector and save to the text_vec field
        Chunk(id=4, text="qux", user_id=4),   # automatically.
    ]
)

πŸ” Search

Vector Search

Vector search help you find the most relevant records based on semantic similarity, so you don't need to explicitly include all the keywords in your query.

df = (
  table.search("<query>")  # πŸ‘ˆ The query will be embedding automatically.
    .filter({"user_id": 2})
    .limit(2)
    .to_pandas()
)

For a complete example, please go to the Vector Search demo.

Fulltext Search

Full-text search helps tokenize the query and find the most relevant records by matching exact keywords.

if not table.has_fts_index("text"):
    table.create_fts_index("text")   # πŸ‘ˆ Create a fulltext index on the text column.

df = (
  table.search("<query>", search_type="fulltext")
    .limit(2)
    .to_pandas()
)

For a complete example, please go to the Fulltext Search demo.

Hybrid Search

Hybrid search combines vector search and fulltext search to provide a more accurate and relevant search result.

from pytidb.rerankers import Reranker

jinaai = Reranker(model_name="jina_ai/jina-reranker-m0")

df = (
  table.search("<query>", search_type="hybrid")
    .rerank(jinaai, "text")  # πŸ‘ˆ Rerank the query result with the jinaai model.
    .limit(2)
    .to_pandas()
)

For a complete example, please go to the Hybrid Search demo.

Advanced Filtering

PyTiDB supports various operators for flexible filtering:

Operator Description Example
$eq Equal to {"field": {"$eq": "hello"}}
$gt Greater than {"field": {"$gt": 1}}
$gte Greater than or equal {"field": {"$gte": 1}}
$lt Less than {"field": {"$lt": 1}}
$lte Less than or equal {"field": {"$lte": 1}}
$in In array {"field": {"$in": [1, 2, 3]}}
$nin Not in array {"field": {"$nin": [1, 2, 3]}}
$and Logical AND {"$and": [{"field1": 1}, {"field2": 2}]}
$or Logical OR {"$or": [{"field1": 1}, {"field2": 2}]}

β›“ Join Structured Data and Unstructured Data

from pytidb import Session
from pytidb.sql import select

# Create a table to store user data:
class User(TableModel, table=True):
    __tablename__ = "users"

    id: int = Field(primary_key=True)
    name: str = Field(max_length=20)


with Session(engine) as session:
    query = (
        select(Chunk).join(User, Chunk.user_id == User.id).where(User.name == "Alice")
    )
    chunks = session.exec(query).all()

[(c.id, c.text, c.user_id) for c in chunks]

πŸ’±Transaction Support

PyTiDB supports transaction management, so you can avoid race conditions and ensure data consistency.

with db.session() as session:
    initial_total_balance = db.query("SELECT SUM(balance) FROM players").scalar()

    # Transfer 10 coins from player 1 to player 2
    db.execute("UPDATE players SET balance = balance - 10 WHERE id = 1")
    db.execute("UPDATE players SET balance = balance + 10 WHERE id = 2")

    session.commit()
    # or session.rollback()

    final_total_balance = db.query("SELECT SUM(balance) FROM players").scalar()
    assert final_total_balance == initial_total_balance
0