A Contextual RAG App built with Streamlit that uses the Retrieval-Augmented Generation (RAG) approach to answer questions based on a given context. This app integrates LangChain, OpenAI, ChromaDB, and Athina AI for efficient document retrieval, contextual compression, and response evaluation.
- Document Loading and Indexing: Automatically loads and indexes documents from a CSV file.
- RAG Workflow: Combines retrieval and generation to answer queries.
- Evaluation: Uses Athina AI for evaluating context relevancy.
- Streamlit Interface: User-friendly interface with text input and evaluation options.
- Secure API Keys: Handles sensitive information securely using
secrets.toml
.
contextual_rag_app/
├── app.py # Main Streamlit application
├── context.csv # Data file for RAG system
├── requirements.txt # Python dependencies
└── .streamlit/
└── secrets.toml # Secure storage for API keys
- Python 3.8+
- API keys for OpenAI and Athina AI
-
Clone the repository:
git clone https://github.com/your-username/contextual_rag_app.git cd contextual_rag_app
-
Create the
context.csv
file:- Format the file as required by the LangChain CSVLoader (e.g., columns like
content
andmetadata
).
- Format the file as required by the LangChain CSVLoader (e.g., columns like
-
Add your API keys to
.streamlit/secrets.toml
:OPENAI_API_KEY = "your_openai_api_key" ATHINA_API_KEY = "your_athina_api_key"
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
-
Open the app in your browser (usually at
http://localhost:8501
). -
Interact with the app:
- Enter a query in the text input field.
- View the generated response and the retrieved context.
- Click "Run Evaluation" to evaluate the response with Athina AI.
- Streamlit: Interactive web application framework.
- LangChain: For RAG and document retrieval.
- OpenAI: Language model for response generation.
- ChromaDB: Vector store for efficient document retrieval.
- Athina AI: Evaluation of context relevancy.
- Pandas: Data manipulation and evaluation results display.
-
app.py
:- Main application code for the Streamlit app.
- Includes indexing, retrieval, and RAG workflow setup.
-
context.csv
:- The input file containing documents to be indexed.
-
requirements.txt
:- List of Python libraries and versions required to run the app.
-
.streamlit/secrets.toml
:- Securely stores API keys. Do not include this file in your public repository.
-
Document Indexing:
- Documents from
context.csv
are split and stored as embeddings in ChromaDB.
- Documents from
-
Retrieval-Augmented Generation (RAG):
- Retrieves the most relevant documents for a query.
- Uses OpenAI's language model to generate a response based on the retrieved context.
-
Evaluation:
- Prepares data for evaluation.
- Uses Athina AI to assess the relevancy of the retrieved context.
- User Query: "What is the capital of France?"
- Generated Response: "The capital of France is Paris."
- Retrieved Context:
- Relevant documents from
context.csv
are displayed.
- Relevant documents from
- Evaluation Results:
- Context relevancy scores displayed in a table.
- Security: Ensure
.streamlit/secrets.toml
is included in.gitignore
to prevent exposing API keys. - Costs: Using OpenAI and Athina AI may incur charges. Monitor API usage carefully.
- Error Handling: Add robust error handling for production-level deployments.
This project is licensed under the MIT License.
Contributions are welcome! Please fork the repository and submit a pull request.
For questions or feedback, feel free to reach out:
- GitHub: YourUsername
- Email: your-email@example.com
Let me know if you'd like to personalize this further!