Retrievvy

Retrievvy is a hybrid retrieval system that blends modern embedding similarity search with classic textual search methods, designed explicitly for speed, efficiency, and reliability.

🚀 Philosophy

The core philosophy behind Retrievvy is straightforward:

Speed First: Lightweight libraries and methods that ensure quick retrieval.
Reliability: Proven, classic methods combined with modern embeddings.
Minimal Footprint: Tools selected for minimal resource usage without sacrificing quality.

You can effortlessly switch to heavier embedding models, but the default setup prioritizes practical, everyday efficiency.

🧩 Basic Concepts

Bundles

A bundle is the fundamental unit of indexing in Retrievvy. Think of it as an individual document or a complete piece of content you want to index.
Each bundle is uniquely identified and processed independently.

Blocks

Every bundle is composed of multiple blocks, which represent logical partitions of the original content (e.g., pages of a PDF, sections of a document).
Blocks are the base unit for indexing and retrieval references.
Retrievvy internally combines these blocks into larger chunks for optimized indexing and retrieval, but references provided in search results always link back to specific blocks.

Example:

If you're indexing PDF documents:

Each PDF file becomes a single bundle.
Each page within the PDF becomes a block.
Search results reference these specific blocks (pages), providing precise navigation.

Example: Indexing a Bundle via API

You can easily send bundles to Retrievvy via a straightforward HTTP request:

curl -X POST http://0.0.0.0:7300/bundle \
     -H "Content-Type: application/json" \
     -d '{
           "id": "unique_bundle_id",
           "index": "my_index",
           "source": "custom_loader",
           "name": "Example Document",
           "blocks": [
             "First block of text content.",
             "Second block of text content.",
             "Third block of text content."
           ]
         }'

This sends a bundle to Retrievvy, which then fully processes and indexes the content (both in dense embeddings and sparse textual indexes).

Example: Searching Information via API

You can search indexed bundles using a simple HTTP GET request:

curl "http://0.0.0.0:7300/query?q=how%20to%20deploy%20a%20docker%20app&index=my_index&limit=10"

q: Your query in natural language.
index: The specific index you wish to search.
limit: The number of search results to retrieve.

🛠️ What's Inside?

Hybrid Retrieval:
- Dense Embeddings: Fast and lightweight using FastEmbed indexed in Qdrant.
- Sparse Textual Search: Classic, reliable BM25 via Xapian.
Adaptive Fusion Reranking:
- Statistically smart evaluation to fuse embedding and textual scores.
- Dynamic weighting, linear transformations, interaction terms, and normalization for optimized results.
Fully Async Design:
- Built with Starlette to ensure rapid, non-blocking responses.
- Embedding computations run separately, keeping the webserver highly responsive.
Optimized Data Handling:
- Database: SQLite3—perfectly suited for single-writer, multiple-reader use cases.
- Serialization & Validation: High-performance msgspec.
- NLP Efficiency: Fast and lightweight NLTK instead of heavier alternatives.

📦 Tech Stack

Webserver: Starlette (Async Python)
Embeddings: FastEmbed
Vector Database: Qdrant
Text Search: Xapian BM25
Database: SQLite3
Serialization: msgspec
NLP: NLTK
Logging: loguru
Retry Handling: tenacity
Tokenization: tiktoken
Deployment: Uvicorn
Additional Tools: chonkie, numpy, yake

🧑‍💻 Quick Start

Clone and launch quickly using Docker:

git clone https://github.com/arvesx/retrievvy.git
cd retrievvy
docker compose up --build

That's it—you're up and running!

🌟 Contribute

Contributions are always welcome. Please contact me before submitting a PR to ensure alignment and efficiency.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
_scripts		_scripts
assets		assets
eval		eval
retrievvy		retrievvy
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
env.example		env.example
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Retrievvy

🚀 Philosophy

🧩 Basic Concepts

Bundles

Blocks

Example:

Example: Indexing a Bundle via API

Example: Searching Information via API

🛠️ What's Inside?

📦 Tech Stack

🧑‍💻 Quick Start

🌟 Contribute

About

Uh oh!

Releases

Packages

Uh oh!

Languages

arvesx/retrievvy

Folders and files

Latest commit

History

Repository files navigation

Retrievvy

🚀 Philosophy

🧩 Basic Concepts

Bundles

Blocks

Example:

Example: Indexing a Bundle via API

Example: Searching Information via API

🛠️ What's Inside?

📦 Tech Stack

🧑‍💻 Quick Start

🌟 Contribute

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages