Expanded document model with image upload and retrieval #39
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
IMPORTANT: This PR contains breaking changes. Read the Miscellaneous section below.
First addition of file based documents, with support for adding Image documents to a collection. This builds on top of the multimodal groundwork added in #37.
upload_documents
methodimage.embeddings.clip
transformer to accept Document objects in addition to imagesstorage
for cloud storage logic (e.g., s3 clients)config
field to the Collection modelThis PR does NOT include the following, which need to be added as part of a future PR:
ImageDocument
,FileDocument
,S3Document
, etc.)Document.meta
examples/images.ipynb
but need to make it more detailed and create a tutorial page in the docsWhy
At Lexy, we believe that AGI should be able to see images and access files. This PR is the next step in supporting any arbitrary file-based documents. It adds support for storage and retrieval of those documents, and is currently restricted to images.
Test plan
pytest sdk-python
in terminalexamples/tests.ipynb
and verify that there are no errorsexamples/tutorial.ipynb
and verify that the tutorial works as expectedexamples/images.ipynb
and verify that the tutorial works as expectedMiscellaneous
This PR contains some breaking changes. The easiest thing to do is nuke and rebuild your containers. If you don't want to do so, you can avoid it by running the following.
Configuring AWS
You can use
aws configure
(recommended) or putAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
in your.env
file.You'll also need to specify an S3 bucket for file storage (for which your AWS credentials should have full access). You can do so by adding
S3_BUCKET=<name-of-your-S3-bucket>
to your.env
file, or by updating the value ofs3_bucket
inlexy/core/config.py
.