-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: In pgvector store how to add new fields to the table it creates and define relationship among my existing table #17022
Comments
To add new fields like Here's an example of how you can add a from sqlalchemy import Column, VARCHAR
# Inside the get_data_model function
if hybrid_search:
class HybridAbstractData(base): # type: ignore
__abstract__ = True
id = Column(BIGINT, primary_key=True, autoincrement=True)
text = Column(VARCHAR, nullable=False)
metadata_ = Column(metadata_dtype)
node_id = Column(VARCHAR)
project_id = Column(VARCHAR) # Add this line for project_id
embedding = Column(Vector(embed_dim)) # type: ignore
text_search_tsv = Column(
TSVector(),
Computed(
"to_tsvector('%s', text)" % text_search_config, persisted=True
),
)
model = type(
class_name,
(HybridAbstractData,),
{"__tablename__": tablename, "__table_args__": {"schema": schema_name}},
)
else:
class AbstractData(base): # type: ignore
__abstract__ = True
id = Column(BIGINT, primary_key=True, autoincrement=True)
text = Column(VARCHAR, nullable=False)
metadata_ = Column(metadata_dtype)
node_id = Column(VARCHAR)
project_id = Column(VARCHAR) # Add this line for project_id
embedding = Column(Vector(embed_dim)) # type: ignore
model = type(
class_name,
(AbstractData,),
{"__tablename__": tablename, "__table_args__": {"schema": schema_name}},
) This code snippet shows how to add a Regarding defining relationships with other tables, the repository does not provide specific guidance on handling foreign key constraints or defining relationships. You might need to manually handle these aspects using SQLAlchemy's relationship and foreign key features. |
@dosu I see we can also pass the table name when creating an index
what columns should be present in my table so i can pass this existing table and llamaindex will be able to use this table by default? Can you give me the list of mandatory columns i must have so i can also do hybrid search? |
@dosu is there a way i can give llamaindex a bunch of text and ask it to just generate the embeddings, the node_id text etc structure and i can manually insert this data whereever i want? |
Yes, you can provide LlamaIndex with a set of text inputs and have it generate the embeddings, node_id, and text structure. Here's a simplified version of the steps:
This process will give you the embeddings, node_id, and text structure, which you can then manually insert elsewhere as needed [1]. |
@dosu now if i want to insert this data into postgresql without using pgvector so that it is similar to how pgvectorstore class stores this data how can i do so? |
To insert data into PostgreSQL in a manner similar to how the
from sqlalchemy import create_engine, Column, BIGINT, VARCHAR, JSONB
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from pgvector.sqlalchemy import Vector
Base = declarative_base()
class YourDataModel(Base):
__tablename__ = 'your_table_name'
id = Column(BIGINT, primary_key=True, autoincrement=True)
text = Column(VARCHAR, nullable=False)
metadata_ = Column(JSONB)
node_id = Column(VARCHAR)
embedding = Column(Vector(1536)) # Assuming 1536 is your embedding dimension
# Create an engine and session
engine = create_engine('postgresql://username:password@localhost/dbname')
Session = sessionmaker(bind=engine)
session = Session()
# Assuming you have a list of nodes with text, node_id, embedding, and metadata
for node in nodes:
data_entry = YourDataModel(
text=node.get_content(),
metadata_=node.metadata,
node_id=node.node_id,
embedding=node.embedding
)
session.add(data_entry)
session.commit() This code sets up a SQLAlchemy model with a schema similar to what |
@dosu just doing node.metadata will not have all the metadata for instance:
this is the metadata that is stored in database when using PGVectorStore whereas
this is what i get when i do node.metadata |
To capture and store all metadata, including nested structures and additional fields, when using Here's how you can use the from llama_index.core.schema import BaseNode
def store_node_with_full_metadata(node: BaseNode):
# Convert the node to a metadata dictionary
metadata = node_to_metadata_dict(node, flat_metadata=False)
# The metadata dictionary now contains the entire node content as a JSON string
# You can store this metadata in your database
# Example: db.store(metadata) By setting |
@dosu with some modifications i was able to manually insert the data into a table named |
To query the from llama_index.vector_stores.postgres import PGVectorStore
from llama_index.core.vector_stores.types import VectorStoreQuery, MetadataFilters, ExactMatchFilter
# Define connection parameters
PARAMS = {
"host": "localhost",
"user": "postgres",
"password": "mark90",
"port": 5432,
}
# Initialize the PGVectorStore
pg = PGVectorStore.from_params(
**PARAMS,
database="test_db",
table_name="data_example",
embed_dim=2 # Adjust the embed_dim as needed
)
# Define the metadata filter to limit the search to documents with chatbot_id of 1234
filters = MetadataFilters(
filters=[ExactMatchFilter(key="chatbot_id", value=1234)]
)
# Create a query with the specified filters
query = VectorStoreQuery(
query_embedding=[0.1, 0.1], # Example embedding vector
similarity_top_k=10,
filters=filters
)
# Execute the query
result = pg.query(query)
# Process the result
for node in result.nodes:
print(node.node_id, node.text)
# Close the connection
pg.close() This code sets up a connection to the PostgreSQL database, applies a filter to only include documents with a |
@dosu
Currently it returns me an 'Empty Response' how can i add filters in this query? |
To add filters to your query using the from llama_index.core.vector_stores.types import VectorStoreQuery, MetadataFilters, MetadataFilter, FilterOperator
# Define the metadata filter to limit the search to documents with chatbot_id of 1234
filters = MetadataFilters(
filters=[
MetadataFilter(
key="chatbot_id",
value=["1234"],
operator=FilterOperator.IN
)
]
)
# Create a query with the specified filters
query = VectorStoreQuery(
query_str="What did the author do?",
similarity_top_k=10,
filters=filters
)
# Execute the query
response = query_engine.query(query) In this example, the |
To add filters to your query using the from llama_index.core.vector_stores.types import VectorStoreQuery, MetadataFilters, MetadataFilter, FilterOperator
# Define the metadata filter to limit the search to documents with chatbot_id of 1234
filters = MetadataFilters(
filters=[
MetadataFilter(
key="chatbot_id",
value=["1234"],
operator=FilterOperator.IN
)
]
)
# Create a query with the specified filters
query = VectorStoreQuery(
query_embedding=[0.1, 0.1], # Example embedding vector
similarity_top_k=10,
filters=filters
)
# Execute the query using the query engine
response = query_engine.query(query) In this example, the |
i got it to work, if you pass the database name as |
There isn't a built-in configuration option or parameter in |
in the table |
To query documents with a specific Here's how you can apply a filter using from llama_index.core.vector_stores.types import VectorStoreQuery, MetadataFilters, MetadataFilter, FilterOperator
# Define the metadata filter to limit the search to documents with chatbot_id of 1
filters = MetadataFilters(
filters=[
MetadataFilter(
key="chatbot_id",
value=["1"],
operator=FilterOperator.IN
)
]
)
# Create a query with the specified filters
query = VectorStoreQuery(
query_str="who is paul graham?",
similarity_top_k=10,
filters=filters
)
# Execute the query using the query engine
response = query_engine.query(query) Ensure that the |
Question Validation
Question
Can we add extra column in case of PGVectorStore in llamaindex. I wanted to define relationship among other table i have and also add fileds like project_id so i can query the data for a particular project.
The text was updated successfully, but these errors were encountered: