Domain-Specific Q&A Agent: The RAG Killer?

This project showcases a simpler, more practical alternative to traditional RAG systems - demonstrating how modern search APIs combined with large context windows can eliminate the complexity of Retrieval-Augmented Generation for many documentation Q&A use cases.

As we enter 2025, there's growing evidence that search-first approaches are becoming more cost-effective and simpler than traditional RAG. With models like Gemini 2.5 Flash offering 5M token context windows at competitive prices, many developers are discovering: "Why build complex RAG pipelines when you can just search and load relevant content into context?"

This project provides a hands-on example of this approach - showcasing intelligent search with domain restrictions and organizational guardrails.

Perfect for organizations wanting to create internal knowledge assistants that stay within approved documentation boundaries without the overhead of traditional RAG infrastructure.

🚀 Key Features

🎯 Smart Tool Selection: Automatically chooses between fast search and comprehensive scraping based on query needs
🔍 Domain-Restricted Search: Only searches approved organizational documentation websites
🧠 Web Scraping Fallback: Comprehensive page scraping when search results are insufficient
📝 Intelligent Summarization: Optional AI-powered result summarization reduces token usage by 60-80%
💰 Cost-Competitive: At $0.005-$0 8000 .075 per query, often cheaper than traditional RAG systems
⚡ Performance Optimized: Fast search for 90% of queries, deep scraping only when needed
🛡️ Data Security: No sensitive data sent to vector databases or training systems
📊 Transparent Sources: Every answer includes clear source attribution from official documentation
🔧 Easy Configuration: Simple CSV file controls which knowledge sources are accessible
💬 Conversation Memory: Maintains context across multiple questions in a session
🎮 Production Ready: FastAPI backend with proper error handling and logging

🚀 Quick Start

Setting Up Your Knowledge Sources

To configure which websites your agent can search, edit the sites_data.csv file. This CSV defines your agent's knowledge boundaries and domains:

domain,site,description
AI Agent Frameworks,github.com/openai/swarm,OpenAI Swarm documentation for lightweight multi-agent orchestration
AI Operations,docs.agentops.ai,AgentOps documentation for testing debugging and deploying AI agents and LLM apps
AI Data Frameworks,docs.llamaindex.ai,LlamaIndex documentation for building LLM-powered agents over your data

CSV Structure:

domain: The subject area or topic (e.g., "AI Agents", "Web Development", "Machine Learning")
site: The actual website domain to search (e.g., "docs.langchain.com", "docs.python.org")
description: A clear explanation of what the site contains and when to use it

Pro Tip: The description is crucial - it's what the agent uses to decide whether a particular site will be helpful for answering a user's question. Be specific about what topics and types of information each site covers.

Obtaining API Keys

Getting a Tavily API Key:

Go to tavily.com and sign up for a free account
Navigate to your dashboard or API section
Find your API key in the dashboard
Tavily offers a generous free tier with thousands of searches per month

Getting a Google API Key:

Visit ai.google.dev (Google AI Studio)
Sign in with your Google account
Click "Get API Key" or navigate to the API keys section
Create a new project if needed
Generate your API key
Google's Gemini API includes a substantial free tier

After obtaining both keys, add them to your .env file:

TAVILY_API_KEY=your_tavily_key_here
GOOGLE_API_KEY=your_google_key_here

Security Note: Keep these keys secure and never commit them to public repositories. Both services offer excellent free tiers suitable for development and small-scale production use.

Option 1: Using Make (Recommended)

# Clone the repository
git clone https://github.com/javiramos1/qagent.git
cd qagent

# Setup environment and install dependencies
make install

# Copy and configure environment variables
cp .env.example .env
# Edit .env with your API keys

# Run the application
make run

Option 2: Using Docker

# Clone the repository
git clone https://github.com/javiramos1/qagent.git
cd qagent

# Copy and configure environment variables
cp .env.example .env
# Edit .env with your API keys

# Run with Docker Compose
make docker-run

🔧 Configuration

Required Environment Variables

GOOGLE_API_KEY=your_google_api_key_here    # Get from Google Cloud Console
TAVILY_API_KEY=your_tavily_api_key_here    # Get from Tavily.com

Optional Environment Variables

# Search Configuration
MAX_RESULTS=10                    # Maximum search results per query
SEARCH_DEPTH=basic              # Search depth: basic or advanced
MAX_CONTENT_SIZE=100000         # Maximum content size per result
MAX_SCRAPE_LENGTH=10000          # Maximum content length for web scraping (characters)
ENABLE_SEARCH_SUMMARIZATION=false  # Enable AI summarization of search results (reduces tokens 60-80%)

# LLM Configuration
LLM_TEMPERATURE=0.1             # Response creativity (0.0-1.0)
LLM_MAX_TOKENS=10000           # Maximum response length

# Timeout Configuration
REQUEST_TIMEOUT=30              # Request timeout in seconds
LLM_TIMEOUT=60                 # LLM response timeout in seconds

# Web Scraping Configuration
USER_AGENT=QAgent/1.0 (Educational Search-First Q&A Agent)  # Identifies your requests (prevents warnings)

📊 Why Search-First Beats RAG in 2025

Cost Reality Check

Our analysis reveals that search-first approaches are now cost-competitive or even cheaper than traditional RAG systems:

# Fair comparison: Same model (Gemini 2.0 Flash), same token usage

# Search-First Approach (this project)
search_cost = $0.075                    # 1M tokens input + 1K output
# No additional infrastructure needed

# Traditional RAG Approach  
rag_llm_cost = $0.075                   # Same LLM costs as search-first
rag_overhead = $0.002                   # Embeddings + vector DB queries
rag_infrastructure = $0.001             # Hosting, maintenance, pipelines
total_rag_cost = $0.078                 # 4% MORE expensive than search-first!

# Ultra-affordable option
gemini_lite_cost = $0.005               # 128K context with Gemini 2.0 Flash-Lite

Key Findings

Gemini 2.0 Flash-Lite: $0.005 per query - 15x cheaper than RAG
Gemini 2.0 Flash: $0.075 per query - same cost as RAG but no infrastructure
Search-first eliminates: Vector databases, embeddings, chunking, maintenance overhead
Always fresh: No stale embeddings or index updates needed

Latest Model Context Windows (2025)

Model	Context Window	Token Pricing	Best For
Gemini 2.0 Flash-Lite	128K tokens	$0.0375/1M input	Most Q&A scenarios
Gemini 2.0 Flash	1M tokens	$0.075/1M input	Complex documentation
Gemini 2.5 Flash Preview	1M tokens	$0.15/1M input	Reasoning-heavy tasks
Gemini 2.5 Pro	5M tokens	$1.25/1M input	Enterprise analysis
Traditional RAG	Variable	$0.077/query	Legacy systems only

Architecture Comparison

Search-First Architecture (This Project):

graph TD
    A[User Query] --> B[Search API]
    B --> C[Relevant Results]
    C --> D[LLM with Context]
    D --> E[Response]
    
    style B fill:#ccffcc
    style D fill:#cceeff

Traditional RAG Architecture:

graph TD
    A[User Query] --> B[Embedding Model]
    B --> C[Vector Database]
    C --> D[Similarity Search]
    D --> E[Chunk Retrieval]
    E --> F[Context Assembly]
    F --> G[LLM Processing]
    G --> H[Response]
    
    I[Document Ingestion] --> J[Chunking]
    J --> K[Embedding Generation]
    K --> L[Vector Storage]
    L --> C
    
    style C fill:#ffcccc
    style J fill:#ffcccc
    style K fill:#ffcccc

Performance Advantages

Recent research (2024-2025) shows that search-first approaches often outperform RAG:

No "lost in the middle" issues - Search returns most relevant content first
Better context relevance - Search algorithms optimize for query relevance
Faster iteration - No embedding regeneration when documents change
Simpler debugging - Easy to 8000 see what content was retrieved and why

2025 Strategy Recommendations

🥇 Primary Approach: Search-First (This Project)

✅ Public documentation - Use search APIs with large context windows
✅ Internal wikis - Search across approved domains with guardrails
✅ Cost optimization - 15x cheaper with Gemini 2.0 Flash-Lite
✅ Simplicity - No vector databases or embedding maintenance
✅ Always current - Real-time search results

🥈 Fallback: Hybrid RAG-Search

🔄 Private enterprise data with strict access controls
🔄 Fine-grained permissions on document chunks
🔄 Offline scenarios where search APIs aren't available

🥉 Legacy: Traditional RAG

⚠️ Specialized use cases requiring complex document relationships
⚠️ Ultra-high volume (>100K queries/day) where infrastructure costs amortize

The Verdict: Search-first approaches have fundamentally changed the game in 2025. This project demonstrates: Search + Large Context > RAG for most organizational knowledge systems. 🚀

🏗️ System Architecture

The system uses a search-first approach with intelligent fallback to web scraping for comprehensive information retrieval:

graph TD
    A[User Query] --> B[LangChain Agent]
    B --> C{Analyze Query}
    C --> D[Select Relevant Sites]
    D --> E[Tavily Search API]
    E --> F{Search Results Sufficient?}
    
    F -->|Yes| G[Generate Response]
    F -->|No| H[Web Scraping Tool]
    H --> J[Extract Page Content]
    J --> K[Combine Search + Scraped Data]
    K --> G
    
    G --> L[Response with Sources]
    
    M[sites_data.csv] --> D
    N[Domain Restrictions] --> E
    N --> H
    
    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style E fill:#e8f5e8
    style H fill:#fff3e0
    style G fill:#e0f2f1
    style L fill:#f1f8e9
    
    classDef searchPath stroke:#4caf50,stroke-width:3px
    classDef scrapePath stroke:#ff9800,stroke-width:3px
    classDef decision stroke:#2196f3,stroke-width:3px
    
    class E,F,G searchPath
    class H,J,K scrapePath
    class C,F decision

Core Components

Domain-Restricted Agent: LangChain agent that only searches approved knowledge sources
Tavily Search Integration: Fast, targeted search within specific documentation websites
Web Scraping Tool: Chromium-based scraping for comprehensive page content extraction
Site Restrictions: CSV-configured domains ensure searches stay within organizational boundaries
Cost Control: Intelligent tool selection minimizes expensive operations

Two-Tier Information Retrieval

Primary: Fast Search - Uses Tavily API to quickly search within approved documentation websites
Fallback: Deep Scraping - When search results are insufficient, automatically scrapes entire pages for comprehensive content

Agent Decision Logic

The agent follows a smart escalation strategy:

Analyze Query: Determine relevant documentation sites based on technologies mentioned
Search First: Use fast Tavily search within selected domains
Evaluate Results: Assess if search provides sufficient information
Scrape if Needed: Only scrape entire pages when search results are incomplete
Comprehensive Response: Combine information from both sources for detailed answers

Model Selection: Gemini Flash Over "Thinking" Models

This system strategically uses Gemini 2.0 Flash (non-thinking model) instead of reasoning-heavy models like o3:

Aspect	Gemini Flash (Non-Thinking)	o3-style (Thinking Models)
Cost	$0.075/1M tokens	$15-60/1M tokens (200-800x more)
Speed	2-5 seconds	15-60 seconds
Token Usage	Minimal overhead	Heavy reasoning chains
Suitability	Perfect for tool-based workflows	Overkill for structured tasks

ReAct Framework Replaces Internal Reasoning:

Human Query → Agent Thinks → Selects Tool → Executes → Observes → Responds
     ↑              ↑            ↑           ↑         ↑         ↑
   Input      ReAct Logic   Tool Selection  Search   Results   Answer

Key Advantages:

Cost-Effective Reasoning: ReAct provides structured thinking at 1/200th the cost
Transparent Logic: Every reasoning step is visible and debuggable
Tool-Optimized: Designed specifically for search + scraping workflows
Faster Responses: No internal chain-of-thought overhead
Easier Boundaries: Explicit tool constraints prevent hallucination

📡 API Reference

The agent provides intelligent two-tier information retrieval through a simple REST API:

Session Management: The API uses secure HTTP cookies to maintain separate conversation memory for each user. When you make your first request, a unique session ID (UUID) is automatically generated and stored in a secure cookie. Each session ID creates its own agent instance with isolated memory, so your conversation history never mixes with other users - even if they're using the API simultaneously.

Available Endpoints

POST /chat - Send a question to the agent (automatically uses search + scraping as needed)
POST /reset - Reset conversation memory
GET /health - Detailed health check with system status

Chat Endpoint Example

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "How do I create a LangChain agent with custom tools?"}'

Example Response:

{
  "status": "success",
  "response": "Based on the LangChain documentation, here's how to create a custom agent..."
}

⚡ Search Result Summarization

Enable intelligent search result summarization to reduce token usage and improve performance:

# Enable summarization in your .env file
ENABLE_SEARCH_SUMMARIZATION=true

Performance Benefits

✅ 60-80% token reduction while preserving key information
✅ 2-3x faster processing with smaller contexts
✅ Lower costs especially for high-volume deployments
✅ Better focus on query-relevant information
✅ Automatic fallback if summarization fails

When to Enable

High-volume scenarios (>1000 queries/day)
Cost-sensitive deployments requiring maximum efficiency
Long documentation pages with lots of boilerplate content
Latency-critical applications where speed matters most

Technical Details

Uses Gemini 2.0 Flash-Lite for ultra-fast, cheap summarization ($0.0375/1M tokens)
Preserves technical details, code examples, and source URLs
Intelligent prompt focuses on query relevance
Graceful degradation if summarization fails

This design choice makes the system practical for production deployment while maintaining high answer quality through structured tool usage rather than expensive internal reasoning.

🔒 How Site Restrictions Work

This project demonstrates organizational AI safety through multiple layers:

Tavily Integration

# In search_tool.py
# The agent selects which website domains to search based on the user's query
search_params = {
    "query": query,
    "include_domains": [site_info["site"] for site_info in sites_info],  # e.g., ["docs.langchain.com"]
    "max_results": max_results,
    "search_depth": search_depth
}

Agent Enforcement

Agent must use search tool for every question
Questions outside configured knowledge sources trigger rejection responses
Clear user guidance about available knowledge areas

Configuration Details

Topic Domains (CSV 'domain' column): Used for categorization and user communication
Website Domains (CSV 'site' column): Used for actual search restrictions in Tavily API

Benefits for Organizations

✅ No data leakage - searches only approved documentation websites
✅ No hallucination - responses based only on real documentation
✅ Audit trail - all searches are logged and traceable
✅ Easy updates - modify sites_data.csv to change knowledge scope
✅ Cost control - limited search scope reduces API usage

🏢 Organizational Use Cases

Internal Documentation Assistant

Employee onboarding guides and company handbooks
HR policy documentation and benefits information
Technical documentation and API references
Process and procedure manuals
Intranet search solutions - Direct search across internal sites

Customer Support Knowledge Base

Product documentation and user guides
FAQ resources and troubleshooting guides
API documentation and developer resources
Release notes and changelog information

Enterprise Knowledge Management

Departmental wikis - Search across team-specific documentation
Project documentation - Access to project specs, requirements, and status updates
Compliance and regulatory - Search through policy documents and guidelines
Training materials - Access to learning resources and certification guides

Compliance and Safety

Regulatory documentation and compliance frameworks
Safety procedures and emergency protocols
Audit requirements and reporting guidelines
Legal documentation and contract templates

Key Advantage: All these use cases can be implemented with simple search approaches rather than complex RAG pipelines.

Enterprise Search Integration: Elasticsearch Alternative

For internal documentation where Tavily API access is limited, adapt the system to use Elasticsearch.

Enterprise Deployment Benefits:

✅ Complete data control - All searches stay within corporate network
✅ Security compliance - No external API calls for sensitive documents
✅ Unified search - Same agent interface for internal and external docs
✅ Permission integration - Leverage existing Elasticsearch security
✅ Cost predictability - No per-query API costs for internal searches

Migration Path: Start with Tavily for public documentation, add Elasticsearch for internal content as needed.

🎯 Educational Goals

This project demonstrates how organizations can:

✅ Implement AI Guardrails - Prevent unauthorized knowledge access
✅ Create Safe AI Assistants - Domain-restricted organizational tools
✅ Use Search-First Architecture - Simpler alternative to RAG systems
✅ Build LangChain Agents - Structured chat agents with tools and constraints
✅ Deploy Production AI - FastAPI, Docker, and monitoring
✅ Manage AI Knowledge Scope - Configuration-driven domain control
✅ Ensure Response Reliability - Force tool usage to prevent hallucination

🛠️ Development

Available Make Commands

make help          # Show all available commands
make install       # Setup virtual environment and dependencies
make run           # Run the application locally
make test          # Run tests
make clean         # Clean up temporary files
make docker-build  # Build Docker image
make docker-run    # Run with docker-compose
make docker-stop   # Stop docker-compose services
make format        # Format code with black
make lint          # Run linting checks

Development Workflow

Setup Development Environment

make install
make dev-install  # Install development dependencies

Make Changes

# Edit code
make format      # Format code
make lint        # Check code quality

Test Changes

make test        # Run tests
make run         # Test locally

Docker Testing

make docker-build
make docker-run
make docker-logs   # View logs

Project Structure

qagent/
├── main.py                 # FastAPI application entry point
├── qa_agent.py            # Core Q&A agent implementation
├── search_tool.py         # Tavily search tool implementation
├── scraping_tool.py       # Web scraping tool implementation
├── sites_data.csv         # Domain configuration
├── requirements.txt       # Python dependencies
├── Dockerfile            # Docker configuration
├── docker-compose.yml    # Docker Compose setup
├── Makefile             # Development commands
├── .env.example         # Environment variables template
├── .gitignore          # Git ignore rules
└── README.md           # This file

🔧 Troubleshooting

Common Issues

API Key Errors
- Ensure .env file exists with valid API keys
- Check API key permissions and quotas
Import Errors
- Activate virtual environment: source qagent_venv/bin/activate
- Install dependencies: make install
Docker Issues
- Ensure Docker is running
- Check port 8000 is available
- View logs: make docker-logs
Search Not Working
- Verify domain configuration in sites_data.csv
- Check Tavily API key and quota

Getting Help

Check the FastAPI documentation
Review LangChain documentation
Examine the logs for error details

🏆 Conclusion

This project showcases the intelligent dual-tool approach that's reshaping AI knowledge systems in 2025. By combining fast search with smart scraping, we've created a system that's:

Simpler than RAG: No vector databases, embeddings, or chunking complexity
Cheaper than RAG: 15x more cost-effective with Gemini 2.0 Flash-Lite
More reliable: Official documentation sources with complete transparency
Always current: Real-time search without stale embedding issues
Production-ready: Built-in guardrails and organizational safety controls

Key Competitive Advantages

Quick Search: Instant results for 90% of queries via Tavily API
Deep Scraping: Comprehensive extraction when search isn't enough
Complete Transparency: Every answer traced to official documentation
Zero Hallucination: Forced tool usage prevents made-up responses
Organizational Control: CSV-driven knowledge boundaries

Perfect for: Internal knowledge assistants, customer support bots, technical documentation systems, and any scenario requiring reliable, traceable AI responses within defined knowledge boundaries.

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! This project follows the Apache 2.0 license terms:

✅ Fork and experiment with the codebase
✅ Submit pull requests for improvements
✅ Use in commercial projects (with proper attribution)
✅ Create derivative works while maintaining license compliance
✅ Educational use encouraged for learning search-first AI development

Please ensure any contributions maintain the educational focus and include proper documentation.

🙏 Acknowledgments

LangChain - Framework for building applications with large language models
Google Gemini - Advanced language model capabilities with affordable pricing
Tavily - Web search API with domain restriction capabilities
FastAPI - Modern, fast web framework for building APIs

Note: This is an educational project demonstrating search-first AI assistant development as a simpler alternative to traditional RAG systems. Feel free to adapt and extend for your organizational needs while respecting the Apache 2.0 license terms.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
qa_agent.py		qa_agent.py
requirements.txt		requirements.txt
scraping_tool.py		scraping_tool.py
search_tool.py		search_tool.py
sites_data.csv		sites_data.csv

License

ptzagk/qagent

Folders and files

Latest commit

History

Repository files navigation