A powerful deep search agent that uses BAML functions to perform intelligent web searches and generate comprehensive answers to questions.
For more details about our project, please visit our blog post.
- π Intelligent web search using Tavily and SerpAPI search providers
- πΈοΈ Web scraping and content extraction with multiple providers (Firecrawl, Browser, BS4, Tavily)
- π§ Multi-step reasoning and reflection
- βοΈ Configurable LLM models for different tasks
- β‘ Asynchronous operation for better performance
- π Comprehensive answer generation with references
- π οΈ Support for customizable pipelines and reasoning methods for deep search
demo.mp4
- Python 3.7+ (required for local development)
- Docker and Docker Compose (required for containerized deployment)
- Node.js and npm (required for local frontend development)
git clone https://github.com/Intelligent-Internet/ii-researcher.git
cd ii-researcher
pip install -e .
# API Keys
export OPENAI_API_KEY="your-openai-api-key"
export TAVILY_API_KEY="your-tavily-api-key" # set this api key when you select SEARCH_PROVIDER is tavily
export SERPAPI_API_KEY="your-serpapi-api-key" # set this api key when you select SEARCH_PROVIDER is serpapi
export FIRECRAWL_API_KEY="your-firecrawl-api-key" # set this api key when you select SCRAPER_PROVIDER is firecrawl
# API Endpoints
export OPENAI_BASE_URL="http://localhost:4000"
# Compress Configuration
export COMPRESS_EMBEDDING_MODEL="text-embedding-3-large"
export COMPRESS_SIMILARITY_THRESHOLD="0.3"
export COMPRESS_MAX_OUTPUT_WORDS="4096"
export COMPRESS_MAX_INPUT_WORDS="32000"
# Search and Scraping Configuration
export SEARCH_PROVIDER="serpapi" # Options: 'serpapi' | 'tavily'
export SCRAPER_PROVIDER="firecrawl" # Options: 'firecrawl' | 'bs' | 'browser' | 'tavily_extract'
# Timeouts and Performance Settings
export SEARCH_PROCESS_TIMEOUT="300" # in seconds
export SEARCH_QUERY_TIMEOUT="20" # in seconds
export SCRAPE_URL_TIMEOUT="30" # in seconds
export STEP_SLEEP="100" # in milliseconds
Config env when using compress by LLM (Optional: For better compression performance)
export USE_LLM_COMPRESSOR="TRUE"
export FAST_LLM="gemini-lite" # The model use for context compression
Config env when run with Pipeline:
# Model Configuration
export STRATEGIC_LLM="gpt-4o" # The model use for choose next action
export SMART_LLM="gpt-4o" # The model use for others tasks in pipeline
Config env when run with Reasoning:
export R_MODEL=r1 # The model use for reasoning
export R_TEMPERATURE=0.2 # Config temperature for reasoning model
export R_REPORT_MODEL=gpt-4o # The model use for writing report
export R_PRESENCE_PENALTY=0 # Config presence_penalty for reasoning model
# Install LiteLLM
pip install litellm
# Create litellm_config.yaml file
cat > litellm_config.yaml << EOL
model_list:
- model_name: text-embedding-3-large
litellm_params:
model: text-embedding-3-large
api_key: ${OPENAI_API_KEY}
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: ${OPENAI_API_KEY}
- model_name: o1-mini
litellm_params:
model: o1-mini
api_key: ${OPENAI_API_KEY}
- model_name: r1
litellm_params:
model: deepseek-reasoner
api_key: ${OPENAI_API_KEY}
litellm_settings:
drop_params: true
EOL
# Start LiteLLM server
litellm --config litellm_config.yaml
The LiteLLM server will run on http://localhost:4000 by default.
cat > litellm_config.yaml << EOL
model_list:
- model_name: text-embedding-3-large
litellm_params:
model: text-embedding-3-large
api_key: ${OPENAI_API_KEY}
- model_name: "gpt-4o"
litellm_params:
model: "openai/chatgpt-4o-latest"
api_base: "https://openrouter.ai/api/v1"
api_key: "your_openrouter_api_key_here"
- model_name: "r1"
litellm_params:
model: "deepseek/deepseek-r1"
api_base: "https://openrouter.ai/api/v1"
api_key: "your_openrouter_api_key_here"
- model_name: "gemini-lite"
litellm_params:
model: "google/gemini-2.0-flash-lite-001"
api_base: "https://openrouter.ai/api/v1"
api_key: "your_openrouter_api_key_here"
litellm_settings:
drop_params: true
EOL
Run the deep search agent with your question:
There are two modes:
- Pipeline Mode: This mode is suitable for general questions and tasks.
python cli.py --question "your question here"
- Reasoning Mode: This mode is suitable for complex questions and tasks.
python cli.py --question "your question here" --use-reasoning --stream
- Install and Run Backend API (In case for frontend serving):
# Start the API server
python api.py
The API server will run on http://localhost:8000
- Setup env for Frontend
Create a .env
file in the frontend directory with the following content:
NEXT_PUBLIC_API_URL=http://localhost:8000
- Install and Run Frontend:
# Navigate to frontend directory
cd frontend
# Install dependencies
npm install
# Start the development server
npm run dev
The frontend will be available at http://localhost:3000
-
Important: Make sure you have set up all environment variables from step 3 before proceeding.
-
Start the services using Docker Compose:
# Build and start all services
docker compose up --build -d
The following services will be started:
- frontend: Next.js frontend application
- api: FastAPI backend service
- litellm: LiteLLM proxy server
The services will be available at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- LiteLLM Server: http://localhost:4000
- View logs:
# View all logs
docker compose logs -f
# View specific service logs
docker compose logs -f frontend
docker compose logs -f api
docker compose logs -f litellm
- Stop the services:
docker compose down
To run the Qwen/QwQ-32B model using SGLang, use the following command:
python3 -m sglang.launch_server --model-path Qwen/QwQ-32B --host 0.0.0.0 --port 30000 --tp 8 --context-length 131072
II-Researcher is inspired by and built with the support of the open-source community:
- LiteLLM β Used for efficient AI model integration.
- node-DeepResearch β Prompt inspiration
- gpt-researcher - Prompt inspiration, web scraper tool
- baml - Structured outputs