8000 GitHub - basedosdados/chatbot
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

basedosdados/chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chatbot

Chatbot is a Python library designed to make it easy for Large Language Models (LLMs) to interact with your data. It is built on top of LangChain and LangGraph 8000 and provides agents and high-level assistants for natural language querying and data visualization.

Note

This library is still under active development. Expect breaking changes, incomplete features, and limited documentation.

Installation

Clone the repository and install it (you can also use poetry or uv instead of pip).

git clone https://github.com/basedosdados/chatbot.git
cd chatbot
pip install .

Assistants

SQLAssistant

The SQLAssistant allows LLMs to interact with your database so you can ask questions about it. All it needs is a LangChain Chat Model, a Context Provider and a Prompt Formatter. The context provider is responsible for providing context about your data to the SQL Agent and the prompt formatter is responsible for building a system prompt for SQL generation.

We provide a default BigQueryContextProvider for retrieving metadata directly from Google BigQuery and a default SQLPromptFormatter. You can supply your own implementation of a context provider and a prompt formatter for custom behaviour.

from langchain.chat_models import init_chat_model

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

prompt_formatter = SQLPromptFormatter()

assistant = SQLAssistant(model, context_provider, prompt_formatter)

response = assistant.invoke("hello! what can you tell me about our database?")

You can optionally use a PostgresSaver checkpointer to add short-term memory to your assistant and a VectorStore for few-shot prompting during SQL queries generation:

from langchain.chat_models import init_chat_model

from langchain_postgres import PGVector
from langgraph.checkpoint.postgres import PostgresSaver

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

# it could be any combination of
# a langchain vector store and an embedding model
vector_store = PGVector(
    connection="your connection string",
    collection_name="your collection name",
    embedding=OpenAIEmbeddings(
      model="text-embedding-3-small",
    ),
)

prompt_formatter = SQLPromptFormatter(vector_store)

DB_URI = "postgresql://postgres:postgres@localhost:5442/postgres

with PostgresSaver.from_conn_strin(DB_URI) as checkpointer:
    checkpointer.setup()

    assistant = SQLAssistant(
        model=model,
        context_provider=context_provider,
        prompt_formatter=prompt_formatter,
        checkpointer=checkpointer,
    )

    response = assistant.invoke(
        message="hello! what can you tell me about our database?",
        thread_id="some uuid"
    )

An async version is also available: AsyncSQLAssistant.

SQLVizAssistant

SQLVizAssistant extends SQLAssistant by not only retrieving data but also preparing it for visualization. It determines which variables should be plotted to each axis, suggests suitable chart types, and defines metadata such as titles, labels, and legends, without actually rendering the chart. It requires a LangChain Chat Model, a Context Provider, and two separate Prompt Formatters: one for SQL queries generation and another for guiding data preprocessing for visualization.

We provide a default VizPromptFormatter, which is used internally by the Visualization Agent during data preprocessing.

from langchain.chat_models import init_chat_model

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter, VizPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

sql_prompt_formatter = SQLPromptFormatter()
viz_prompt_formatter = VizPromptFormatter()

assistant = SQLVizAssistant(
  model, context_provider, sql_prompt_formatter, viz_prompt_formatter
)

response = assistant.invoke("hello! what can you tell me about our database?")

You can also optionally use a PostgresSaver checkpointer to add short-term memory to your assistant, and provide langchain vector stores for few-shot prompting during both SQL generation and data preprocessing for visualization:

from langchain.chat_models import init_chat_model

from langchain_postgres import PGVector
from langgraph.checkpoint.postgres import PostgresSaver

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter, VizPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

# it could be any combination of
# a langchain vector store and an embedding model
sql_vector_store = PGVector(
    connection="your connection string",
    collection_name="your sql collection name",
    embedding=OpenAIEmbeddings(
      model="text-embedding-3-small",
    ),
)

viz_vector_store = PGVector(
    connection="your connection string",
    collection_name="your viz collection name",
    embedding=OpenAIEmbeddings(
      model="text-embedding-3-small",
    ),
)

sql_prompt_formatter = SQLPromptFormatter(sql_vector_store)
viz_prompt_formatter = VizPromptFormatter(viz_vector_store)

DB_URI = "postgresql://postgres:postgres@localhost:5442/postgres

with PostgresSaver.from_conn_strin(DB_URI) as checkpointer:
    checkpointer.setup()

    assistant = SQLAssistant(
        model=model,
        context_provider=context_provider,
        sql_prompt_formatter=sql_prompt_formatter,
        viz_prompt_formatter=viz_prompt_formatter,
        checkpointer=checkpointer,
    )

    response = assistant.invoke(
        message="hello! what can you tell me about our database?",
        thread_id="some uuid"
    )

An async version is also available: AsyncSQLVizAssistant.

Extensibility

Under the hood, both assistants rely on composable agents:

  • SQLAgent – Handles database metadata retrieval, query generation and execution.
  • VizAgent – Handles visualization reasoning.
  • RouterAgent – Orchestrates SQL querying and data visualization via a multi-agent workflow..

There is also an implementation of a simple ReActAgent with support to custom system prompts and short-term memory, to which you can add an arbitrary set of tools.

You can directly use these agents or use them to create your own workflows.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages

0