8000 Web Search Quick/Deep router with Tavily/Jina DeepSearch by Onnson · Pull Request #177 · ai-christianson/RA.Aid · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Web Search Quick/Deep router with Tavily/Jina DeepSearch #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Onnson
Copy link
@Onnson Onnson commented Apr 1, 2025

Added Jina DeepSearch for Deep Web Research

Summary

This PR adds a search router to choose between the Tavily-based web search implementation and Jina DeepSearch to improve the quality, reliability, and control of web research performed by RA-Aid.

Motivation & Benefits

  • Improved search quality: Leverages Jina's advanced capabilities for iterative reasoning and source validation
  • Greater control: Provides fine-grained parameters for reasoning effort and domain filtering
  • Better information validity: Enhances cross-source verification for more reliable research output

Summary by CodeRabbit

  • New Features

    • Introduced multi-agent communication with structured query prompts for enhanced coordination.
    • Improved web research capabilities powered by a new search API that delivers more accurate results.
    • Added support for Jina DeepSearch API for enhanced web search functionality.
    • Enhanced environment variable validation with new checks for required keys.
    • Updated command-line arguments to set default values for provider and model.
    • Added new tools for quick and selective web searches.
  • Chores

    • Updated dependency requirements to specify Python version and added requests library.
    • Enhanced tool management infrastructure for streamlined metadata extraction and overall consistency.
    • Added new constants for expected tool sets in tests to improve testing accuracy.
    • Updated .gitignore to exclude specific directories from version control.

Copy link
coderabbitai bot commented Apr 1, 2025

Walkthrough

This update introduces several enhancements across configuration, core functionality, tools, and tests. The changes include updating the Git ignore and dependency configurations, adding a new multi-agent prompt framework, and revamping environment variable validation to use the Jina API key. The web search tool has been transitioned from Tavily to Jina with the addition of a dedicated Jina search client. Tests have been updated accordingly to reflect these modifications, ensuring that environment validations and tool configurations align with the new implementations.

Changes

File(s) Change Summary
.gitignore
pyproject.toml
Added new ignore rule for agent_docs/; introduced Python version dependency (^3.9) and a requests library dependency (^2.31.0) in Poetry configuration.
ra_aid/env.py Added a new data class WebResearchValidationResult and functions validate_web_research, check_web_research_env, and check_env for environment variable validation, switching from TAVILY_API_KEY to JINA_API_KEY.
ra_aid/prompts/__init__.py
ra_aid/prompts/multi_agent_prompts.py
ra_aid/prompts/web_research_prompts.py
Imported and defined multi-agent prompt entities and JSON schemas; updated web research prompt instructions to specify Jina DeepSearch functionalities.
ra_aid/tool_configs.py
ra_aid/tools/__init__.py
ra_aid/tools/web_search_jina.py
ra_aid/tools/web_search_tavily.py
Replaced references to web_search_tavily with web_search_jina; added tool metadata classes and an extraction function; introduced JinaDeepSearchClient and corresponding tool function; removed the Tavily implementation.
tests/ra_aid/test_env.py
tests/ra_aid/test_tool_configs.py
tests/ra_aid/test_web_search_jina.py
Updated tests for environment variable validations (using JINA_API_KEY) and added new test constants for tool configurations; created an empty test file for web_search_jina.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant WS as web_search_jina()
    participant JC as JinaDeepSearchClient
    participant API as Jina DeepSearch API
    U->>WS: Call web_search_jina(query, parameters)
    WS->>JC: Initialize client (with API key)
    JC->>API: Send search request
    API-->>JC: Return search results
    JC-->>WS: Process response
    WS-->>U: Return search result data
Loading
sequenceDiagram
    participant App as Application
    participant Env as check_env()
    participant Validator as validate_web_research()
    App->>Env: Trigger environment check
    Env->>Validator: Validate presence of JINA_API_KEY
    Validator-->>Env: Return validation result (valid/missing info)
    Env-->>App: Provide overall environment status
Loading

Suggested reviewers

  • ai-christianson

Poem

Hoppy coder, I leap with glee,
Through fields of change in code so free.
With Jina keys and prompts anew,
My garden of tools now fresh as dew.
A carrot snack for tests so bright,
I hop ahead, coding day and night.
Enjoy these changes—our code takes flight!

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (2)
tests/ra_aid/test_tool_configs.py (2)

76-84: ⚠️ Potential issue

Test assertion needs to be updated.

The test is still expecting web_search_tavily in the expected_names list, but the implementation now uses web_search_jina.

    expected_names = [
        "emit_expert_context",
        "ask_expert",
-       "web_search_tavily",
+       "web_search_jina",
        "emit_research_notes",
        "task_completed",
    ]

92-95: ⚠️ Potential issue

Update test assertion to match new implementation.

The assertion still checks for web_search_tavily when verifying tool names without expert enabled, but the implementation has changed to use web_search_jina.

    assert sorted(tool_names_no_expert) == sorted(
-       ["web_search_tavily", "emit_research_notes", "task_completed"]
+       ["web_search_jina", "emit_research_notes", "task_completed"]
    )
🧹 Nitpick comments (4)
ra_aid/env.py (1)

113-120: Minor duplication with validate_web_research.
The logic repeats steps already found in validate_web_research. Consider using that function under the hood to ensure consistency and reduce duplication.

 def check_web_research_env() -> List[str]:
-    web_research_missing = []
-    key = "JINA_API_KEY"
-    if not os.environ.get(key):
-        web_research_missing.append(f"{key} environment variable is not set")
-    return web_research_missing
+    result = validate_web_research()
+    return result.missing_vars
ra_aid/tools/web_search_jina.py (1)

1-11: Remove unused import Union.
The Union import at line 2 is never utilized. Removing it helps maintain a clean codebase.

-from typing import Dict, Optional, List, Union
+from typing import Dict, Optional, List
🧰 Tools
🪛 Ruff (0.8.2)

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)

ra_aid/tools/__init__.py (2)

160-272: Docstring parsing logic works but could be more robust.
The extract_tool_metadata function adequately extracts arguments and descriptions from structured docstrings, but it may be fragile if the format deviates (e.g., multiline argument annotations, advanced type annotations like Union[Type1, Type2], or custom docstring styles). Also, note that the nested “if” statements that set tool_type might override each other if the function name includes multiple matching keywords (for example, “code_complete”). Switching those matched conditions to a chain of if...elif...elif would clarify the final tool type.

Example improvement for lines 236-239:

-if "code" in name_lower or "modification" in name_lower:
-    tool_type = ToolType.CODE_MODIFICATION
-if "complete" in name_lower:
-    tool_type = ToolType.CODE_COMPLETION
+if "code" in name_lower or "modification" in name_lower:
+    tool_type = ToolType.CODE_MODIFICATION
+elif "complete" in name_lower:
+    tool_type = ToolType.CODE_COMPLETION

274-284: Optional example usage block.
The commented-out example usage is helpful for quick testing or demonstration, but consider moving it to a dedicated example file or test suite to avoid clutter in production code.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4dd8e6 and ef376e1.

📒 Files selected for processing (13)
  • .gitignore (1 hunks)
  • pyproject.toml (1 hunks)
  • ra_aid/env.py (2 hunks)
  • ra_aid/prompts/__init__.py (2 hunks)
  • ra_aid/prompts/multi_agent_prompts.py (1 hunks)
  • ra_aid/prompts/web_research_prompts.py (5 hunks)
  • ra_aid/tool_configs.py (6 hunks)
  • ra_aid/tools/__init__.py (2 hunks)
  • ra_aid/tools/web_search_jina.py (1 hunks)
  • ra_aid/tools/web_search_tavily.py (0 hunks)
  • tests/ra_aid/test_env.py (13 hunks)
  • tests/ra_aid/test_tool_configs.py (1 hunks)
  • tests/ra_aid/test_web_search_jina.py (1 hunks)
💤 Files with no reviewable changes (1)
  • ra_aid/tools/web_search_tavily.py
🧰 Additional context used
🧬 Code Definitions (3)
ra_aid/env.py (1)
ra_aid/provider_strategy.py (2)
  • ProviderFactory (393-419)
  • ValidationResult (11-15)
ra_aid/tools/__init__.py (5)
ra_aid/tools/expert.py (1)
  • ask_expert (159-323)
ra_aid/tools/research.py (3)
  • existing_project_detected (14-48)
  • monorepo_detected (52-89)
  • ui_detected (93-126)
ra_aid/tools/shell.py (1)
  • run_shell_command (40-148)
ra_aid/tools/web_search_jina.py (1)
  • web_search_jina (94-187)
ra_aid/tools/memory.py (7)
  • deregister_related_files (609-632)
  • emit_key_facts (116-186)
  • emit_key_snippet (192-295)
  • emit_related_files (408-540)
  • emit_research_notes (47-112)
  • plan_implementation_completed (365-394)
  • task_completed (333-361)
tests/ra_aid/test_env.py (1)
ra_aid/env.py (3)
  • validate_environment (217-283)
  • check_env (122-157)
  • check_web_research_env (113-119)
🪛 Ruff (0.8.2)
ra_aid/tools/web_search_jina.py

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)

🔇 Additional comments (35)
.gitignore (1)

23-25: LGTM!

Well-documented addition to ignore the agent memory bank directory.

ra_aid/prompts/__init__.py (2)

68-73: LGTM!

The multi-agent prompt imports are well-structured and follow the established pattern in this file.


123-126: LGTM!

Properly exposing the new multi-agent prompt constants in the __all__ list.

ra_aid/tool_configs.py (6)

27-27: Import modification looks correct.

The import change from web_search_tavily to web_search_jina aligns with the PR objective to replace Tavily with Jina DeepSearch for web research.


196-196: Properly updated research tools.

Correctly replaced web_search_tavily with web_search_jina in the research tools list.


274-274: Planning tools updated correctly.

Web search tool has been properly updated to use Jina DeepSearch in the planning tools configuration.


312-312: Implementation tools correctly updated.

Added web_search_jina to the implementation tools, maintaining functionality while migrating from Tavily to Jina DeepSearch.


335-335: Web research tools initialization updated correctly.

Changed the initialization of web research tools to use web_search_jina instead of web_search_tavily.


365-365: Chat tools properly updated.

Successfully migrated the chat tools to use Jina DeepSearch instead of Tavily.

tests/ra_aid/test_env.py (6)

7-7: Import updated to include new environment check functions.

Correctly updated the imports to include the new check_env and check_web_research_env functions.


34-34: Environment variable updated correctly.

Changed from checking for TAVILY_API_KEY to JINA_API_KEY in the clean_env fixture.


57-57: Tests updated to check for Jina API key.

All assertions have been properly updated to check for JINA_API_KEY instead of TAVILY_API_KEY.

Also applies to: 75-75, 99-99, 117-117, 128-128, 152-152, 165-165, 184-184, 195-195, 216-216, 239-239


244-260: Added comprehensive test for web research environment check.

This test thoroughly validates the check_web_research_env function, testing both when the API key is missing and when it's present. It also properly cleans up after the test.


262-311: Well-structured test for comprehensive environment checking.

This test effectively validates the check_env function across multiple scenarios:

  1. When no environment variables are set
  2. When only required variables are set
  3. When all variables are set

It also properly saves and restores the original environment state.


314-332: Added test for web research validation with Jina.

This test verifies that the web research functionality correctly validates the presence of the Jina API key. It tests both when the key is missing and when it's present.

ra_aid/prompts/web_research_prompts.py (4)

2-2: Updated documentation to reference Jina DeepSearch.

Correctly updated the module docstring to mention that web research is powered by Jina DeepSearch.


12-12: Updated prompt sections to highlight DeepSearch capabilities.

Each research-related prompt section has been updated to emphasize the specific capabilities of Jina DeepSearch, such as iterative reasoning, information validation, implementation detail verification, and providing up-to-date information.

Also applies to: 22-22, 32-32, 42-42


46-47: Updated assistant description to mention Jina DeepSearch.

The virtual assistant description now correctly references Jina DeepSearch as the tool used for finding, validating, and synthesizing information.


54-105: Added comprehensive web research behavior guide.

This substantial enhancement provides detailed guidance on using Jina DeepSearch's capabilities, structured across six key areas:

  1. Search Strategy
  2. Quality Control
  3. Response Generation
  4. Domain Expertise
  5. Research Triggers
  6. Output Format

This comprehensive guide will significantly improve the assistant's ability to leverage Jina DeepSearch for high-quality web research.

ra_aid/env.py (4)

5-6: Clean import usage.
These imports—Any, List, and Tuple—are all referenced elsewhere in the file, so their inclusion is valid. No concerns to report here.


11-16: Well-structured data class.
Defining WebResearchValidationResult as a dataclass helps improve readability and maintainability for environment validation results.


18-31: Environment validation for JINA_API_KEY looks good.
This function clearly checks for the presence of the JINA_API_KEY and cleanly returns a validation result, complying with the new Jina-based approach.


122-158: Comprehensive environment checks for required and optional variables.
The function effectively centralizes environment checks, including JINA_API_KEY. No issues or performance concerns.

ra_aid/prompts/multi_agent_prompts.py (4)

1-6: Informative module docstring.
The high-level overview is clear and sets a good context for multi-agent communication.


8-59: Robust JSON schema for requests.
The MULTI_AGENT_REQUEST_SCHEMA is detailed, covering all necessary requirements for multi-agent queries, including context, validity checks, and desired outputs.


60-162: Comprehensive prompt documentation.
MULTI_AGENT_QUERY_HANDLER_PROMPT thoroughly explains creation and processing of multi-agent queries, ensuring clarity and guiding consistent usage.


164-203: Well-defined response schema.
MULTI_AGENT_IMPLEMENTATION_SCHEMA neatly defines the structure for multi-agent responses, including error handling and partial completions.

ra_aid/tools/web_search_jina.py (3)

17-24: Environment-handling logic is clear.
The constructor ensures a suitable error is raised when JINA_API_KEY is missing, adhering to best practices.


26-91: Efficient search method for streaming responses.
The search method correctly handles streaming and non-streaming conditions, raising HTTP errors as needed and returning parsed JSON.


93-187: Tool function integrates well with logging and error handling.
Recording trajectory data before and after the search is a solid approach for traceability. Exception handling is consistent with existing patterns.

ra_aid/tools/__init__.py (5)

1-7: Nice module-level docstring and imports.
The docstring cleanly introduces the purpose of the module, and the imports use standard Python libraries (typing, dataclasses, enum, inspect) required for the new functionality. No issues found here.


8-45: Imports of tool functions look consistent.
These imports consolidate the various tool functions into a single module. The commented notes on lines 14, 15, 17, and 44 about function names and file references are helpful for maintainability. Ensure that each function reference is up-to-date if you rename or relocate any tool in the future to avoid broken imports.


46-79: Centralized tool registry for easy discovery and usage.
The AVAILABLE_TOOLS dictionary properly aggregates tool functions by name, increasing discoverability and consistency across the codebase. This approach is good for dynamic invocation and introspection of available tools. However, ensure that the dictionary remains in sync with your __all__ and docstrings, especially if function names are refactored or removed.


81-117: Explicit export list ensures clarity.
Defining __all__ is helpful to control what symbols are publicly exported. This matches your AVAILABLE_TOOLS dictionary fairly closely, which is good. Watch out for duplication in the event of further refactors—if something changes in AVAILABLE_TOOLS, remember to check __all__ as well.


120-158: Effective setup of enumeration and dataclasses.
The ToolType enum provides a clear categorization of tool functionalities, while ToolArgument and ToolMetadata dataclasses succinctly capture tool details. This is a clean design that improves maintainability.

Comment on lines +89 to +91
[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.31.0" # For Jina DeepSearch API
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Review Poetry dependency configuration with existing project setup

The new Poetry dependency section specifies Python "^3.9" while the project's main configuration requires ">=3.10" (line 11). This creates potential confusion about the minimum supported Python version. Additionally, "requests" is already listed as a dependency on line 54 (without version constraints), creating a redundant declaration.

Consider either:

  1. Removing the Poetry section if you're using Hatch as your build system
  2. Consolidating the dependencies to ensure version requirements are consistent
-[tool.poetry.dependencies]
-python = "^3.9"
-requests = "^2.31.0"  # For Jina DeepSearch API

If you need to specify a version for the requests library, update line 54 instead:

-    "requests",
+    "requests>=2.31.0",  # For Jina DeepSearch API

Committable suggestion skipped: line range outside the PR's diff.

@@ -0,0 +1 @@

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Empty test file needs implementation

This test file is empty but should contain tests for the new Jina DeepSearch functionality. Since this PR is replacing Tavily with Jina, comprehensive test coverage is essential to ensure the new implementation works correctly.

Please implement test cases for the Jina DeepSearch client including:

  • Unit tests for successful API calls
  • Error handling tests
  • Mocked responses to avoid actual API calls during testing
  • Verification that search results are properly processed

I can help generate test examples if needed.

Comment on lines +116 to +137
EXPECTED_RESEARCH_TOOLS = {
"web_search_jina",
"emit_research_notes",
"task_completed"
}

EXPECTED_PLANNING_TOOLS = {
"web_search_jina",
"create_plan",
"task_completed"
}

EXPECTED_IMPLEMENTATION_TOOLS = {
"web_search_jina",
"emit_code_block",
"task_completed"
}

EXPECTED_CHAT_TOOLS = {
"web_search_jina",
"task_completed"
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Added constants for expected tools but they're not used.

These new constants define the expected tools for different categories, including web_search_jina, but they aren't used in the existing tests. The tests still check for web_search_tavily and will fail.

Consider updating the existing tests to use these new constants, or update the tests directly as suggested in the previous comments.

…with Jina DeepSearch for web research.\n- Create new web_search_jina tool and client.\n- Update prompts, tool configurations, environment variables, and tests related to web search.\n- Add requests dependency for Jina API.\n- Remove old web_search_tavily implementation and references.
@Onnson Onnson force-pushed the feat/jina-deepsearch-integration branch from ef376e1 to b2d6a12 Compare April 1, 2025 21:40
Copy link
Author
@Onnson Onnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed missing dependencies

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
ra_aid/tools/__init__.py (2)

3-3: Remove unused Union import.

According to the static analysis hints, typing.Union appears to be imported but not used. Please remove the unused import to keep the code clean and avoid confusion.

-from typing import List, Dict, Any, Optional, Union
+from typing import List, Dict, Any, Optional

14-14: Clarify the in-line comment.

The comment # Assuming this is the intended function name, not read_file_tool is redundant and mirrors the same name. Consider removing or updating it to provide clearer context.

-from .read_file import read_file_tool # Assuming this is the intended function name, not read_file_tool
+from .read_file import read_file_tool
ra_aid/tools/web_search_jina.py (2)

2-2: Remove unused Union import.

The Union type from typing is imported but never used. You can safely remove it to keep the imports concise.

-from typing import Dict, Optional, List, Union
+from typing import Dict, Optional, List
🧰 Tools
🪛 Ruff (0.8.2)

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)


79-90: Consider handling partial line chunks in streaming.

While the current approach to parsing streaming lines is workable, any incomplete JSON fragments split across lines would be ignored. If partial lines are common in Jina’s stream, consider buffering them to minimize potential data loss.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef376e1 and b2d6a12.

📒 Files selected for processing (13)
  • .gitignore (1 hunks)
  • pyproject.toml (1 hunks)
  • ra_aid/env.py (2 hunks)
  • ra_aid/prompts/__init__.py (2 hunks)
  • ra_aid/prompts/multi_agent_prompts.py (1 hunks)
  • ra_aid/prompts/web_research_prompts.py (5 hunks)
  • ra_aid/tool_configs.py (6 hunks)
  • ra_aid/tools/__init__.py (2 hunks)
  • ra_aid/tools/web_search_jina.py (1 hunks)
  • ra_aid/tools/web_search_tavily.py (0 hunks)
  • tests/ra_aid/test_env.py (13 hunks)
  • tests/ra_aid/test_tool_configs.py (1 hunks)
  • tests/ra_aid/test_web_search_jina.py (1 hunks)
💤 Files with no reviewable changes (1)
  • ra_aid/tools/web_search_tavily.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • tests/ra_aid/test_web_search_jina.py
  • .gitignore
  • ra_aid/prompts/init.py
  • pyproject.toml
  • tests/ra_aid/test_tool_configs.py
🧰 Additional context used
🧬 Code Definitions (3)
tests/ra_aid/test_env.py (1)
ra_aid/env.py (3)
  • validate_environment (217-283)
  • check_env (122-157)
  • check_web_research_env (113-119)
ra_aid/env.py (1)
ra_aid/provider_strategy.py (2)
  • ProviderFactory (393-419)
  • ValidationResult (11-15)
ra_aid/tools/__init__.py (5)
ra_aid/tools/expert.py (1)
  • ask_expert (159-323)
ra_aid/tools/research.py (4)
  • existing_project_detected (14-48)
  • monorepo_detected (52-89)
  • ui_detected (93-126)
  • mark_research_complete_no_implementation_required (130-172)
ra_aid/tools/shell.py (1)
  • run_shell_command (40-148)
ra_aid/tools/web_search_jina.py (1)
  • web_search_jina (94-187)
ra_aid/tools/memory.py (7)
  • deregister_related_files (609-632)
  • emit_key_facts (116-186)
  • emit_key_snippet (192-295)
  • emit_related_files (408-540)
  • emit_research_notes (47-112)
  • plan_implementation_completed (365-394)
  • task_completed (333-361)
🪛 Ruff (0.8.2)
ra_aid/tools/web_search_jina.py

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)

🔇 Additional comments (27)
ra_aid/prompts/multi_agent_prompts.py (5)

1-6: Well-documented module purpose

Good job including a clear docstring that explains the purpose of this module. The documentation effectively communicates that this module establishes a protocol for multi-agent communication between RA-Aid and other agent systems.


9-58: Well-structured request schema definition

The JSON schema for multi-agent communication is comprehensive and clearly defines the structure for requests. The schema properly handles various content types and includes necessary validation rules.

A few observations:

  • Required fields are properly specified
  • The nested structure for questions and context is well-organized
  • The enum for output types provides good constraints for response formatting

This schema will help ensure consistency in the multi-agent communication protocol.


61-162: Comprehensive prompt with clear guidelines

The prompt is well-structured and provides detailed instructions for both creating and processing multi-agent requests. It covers all aspects of the communication flow including:

  • Request creation with examples
  • Processing guidelines
  • Response generation based on output types
  • Error handling procedures
  • Quality checks

This approach ensures agents will have clear guidance on how to interact within this framework.


165-203: Complete implementation schema for responses

The response implementation schema properly defines the structure for agent responses with:

  • Clear status indicators (complete, partial, error)
  • Required response fields
  • Structured error reporting

This schema will help ensure consistency in how agent responses are formatted and validated.


1-203: Excellent multi-agent framework implementation

This new file establishes a robust framework for multi-agent communication that will support the transition from Tavily to Jina Deep 8000 Search. While not directly mentioning the search providers, this structured communication protocol will help manage complex interactions between agents, which is particularly valuable for web research tasks that require iterative reasoning and source validation (key benefits mentioned in the PR objectives).

The schemas and prompts are well-designed and should facilitate the enhanced control and fine-grained parameters that Jina DeepSearch offers compared to Tavily.

ra_aid/tool_configs.py (6)

27-27: Tool import change from tavily to jina looks good.

This change correctly updates the import statement to reflect the new web search provider.


196-196: Consistent replacement of web search tool in RESEARCH_TOOLS.

The reference in the RESEARCH_TOOLS list has been properly updated.


274-274: Updated planning tools to use Jina search.

The get_planning_tools function has been properly updated to use the new search implementation.


312-312: Implementation tools now use Jina for web search.

This change correctly updates the implementation tools to use the new search capability.


335-335: Web research tools updated to use Jina.

The get_web_research_tools function has been correctly modified to use Jina DeepSearch.


365-365: Chat tools now use Jina for search functionality.

The get_chat_tools function has been correctly updated to use the new search implementation.

tests/ra_aid/test_env.py (6)

7-7: Import statement updated to include new environment check functions.

The import has been properly updated to include the new functions required for testing.


34-34: Updated clean_env fixture to use JINA_API_KEY.

The test environment setup has been properly modified to use Jina instead of Tavily.


57-57: All test assertions updated to check for JINA_API_KEY.

All test assertions have been consistently updated to check for the Jina API key instead of Tavily.

Also applies to: 75-75, 99-99, 117-117, 128-128, 152-152, 165-165, 184-184, 195-195, 216-216, 239-239


244-260: Good test coverage for check_web_research_env function.

This new test thoroughly verifies the behavior of the check_web_research_env function with both missing and present API keys.


262-312: Comprehensive test for check_env function.

This test covers all scenarios for the check_env function:

  • No environment variables set
  • Only required variables set
  • All variables set

The test also properly handles setup and cleanup of environment variables.


314-333: Good validation test for Jina web research environment.

This test thoroughly verifies the validation logic for the Jina API key in the web research context.

ra_aid/env.py (5)

5-6: Updated imports for new data class and typing.

The import statements have been properly updated to support the new functionality.


11-15: Well-structured data class for web research validation.

The WebResearchValidationResult data class follows the pattern of existing ValidationResult class and is well-implemented.


18-31: Clean implementation of validate_web_research function.

This function properly validates the Jina API key and returns a structured result. The implementation is consistent with other validation functions in the module.


113-119: Clear implementation of check_web_research_env function.

This utility function correctly checks for the presence of the Jina API key and returns appropriate messages.


122-157: Comprehensive environment variable validation.

The check_env function provides a centralized place for validating all environment variables, with clear separation between required and optional variables.

The function is well-documented and follows good practices by:

  • Checking required variables first
  • Handling optional variables separately
  • Including web research variables
  • Providing clear return values
ra_aid/prompts/web_research_prompts.py (3)

2-2: Updated docstring to mention Jina DeepSearch.

The docstring now correctly indicates that the prompts are powered by Jina DeepSearch.


12-12: Updated prompt sections with DeepSearch capabilities.

All web research prompt sections have been updated to reference specific DeepSearch capabilities:

  • Research section highlights iterative reasoning
  • Planning section emphasizes information validation
  • Implementation section focuses on verification from trusted sources
  • Chat section highlights up-to-date, verified information

These updates properly align with the capabilities of the new search provider.

Also applies to: 22-22, 32-32, 42-42


46-106: Comprehensive update to web research prompt.

The web research prompt has been significantly enhanced with detailed sections on Jina DeepSearch capabilities:

  1. Search Strategy - Highlights iterative reasoning and source validation
  2. Quality Control - Emphasizes verification from reliable sources
  3. Response Generation - Focuses on synthesizing information clearly
  4. Domain Expertise - Prioritizes official documentation and trusted sources
  5. Research Triggers - Defines scenarios for initiating web research
  6. Output Format - Specifies structure for research responses

This comprehensive update provides clear guidance for leveraging DeepSearch effectively and aligns with the PR objectives of improving search quality, control, and information validity.

ra_aid/tools/__init__.py (1)

8-112: Looks good overall.

The structure of AVAILABLE_TOOLS, the new enums/classes (ToolType, ToolArgument, ToolMetadata), and the extract_tool_metadata function appear well-implemented and maintainable. The docstring parsing logic is clean and straightforward.

ra_aid/tools/web_search_jina.py (1)

93-187: Implementation is solid.

The web_search_jina tool integrates well with the new Jina DeepSearch client, logs searches and errors appropriately, and neatly returns structured results. No further issues are apparent.

@ariel-frischer
Copy link
Collaborator

Why remove Tavily completely? Can you refactor to use Jina instead of tavily with a flag like --jina? I've never tried Jina it may very well be much better then Tavily but I prefer to use Tavily for now for stability and because I actually have paid version of the thing. Also a good reason to keep both is if one free (plan) gets used up you can switch to the other one.

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
ra_aid/tools/web_search_jina.py (6)

2-2: Remove unused import.

You import typing.Union but never use it. Remove this unused import to align with best practices and avoid lint warnings.

Apply this diff:

-from typing import Dict, Optional, List, Union
+from typing import Dict, Optional, List
🧰 Tools
🪛 Ruff (0.8.2)

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)


79-91: Centralize API key handling.

You're re-checking os.environ.get("JINA_API_KEY") inside this function, even though the JinaDeepSearchClient constructor does the same. Consider consolidating the API key logic in a single place (e.g., the client) to reduce duplication and potential inconsistencies.

-    api_key = os.environ.get("JINA_API_KEY")
-    if not api_key:
-        ...
-    ...
-    headers = {

8000
-        "Content-Type": "application/json",
-        "Authorization": f"Bearer {api_key}"
-    }
+    client = JinaDeepSearchClient()
+    api_key = client.api_key
+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer {client.api_key}"
+    }

107-109: Comply with Python style guidelines.

Multiple statements on one line can be less readable. Please place the block body on separate lines as recommended by PEP 8.

-if good_domains: data["good_domains"] = good_domains
-if bad_domains: data["bad_domains"] = bad_domains
-if only_domains: data["only_domains"] = only_domains
+if good_domains:
+    data["good_domains"] = good_domains
+if bad_domains:
+    data["bad_domains"] = bad_domains
+if only_domains:
+    data["only_domains"] = only_domains
🧰 Tools
🪛 Ruff (0.8.2)

107-107: Multiple statements on one line (colon)

(E701)


108-108: Multiple statements on one line (colon)

(E701)


109-109: Multiple statements on one line (colon)

(E701)


132-132: Avoid single-line control flow statements.

Same issue: prefer moving continue to a new line for clarity.

-if json_str == "[DONE]": continue
+if json_str == "[DONE]":
+    continue
🧰 Tools
🪛 Ruff (0.8.2)

132-132: Multiple statements on one line (colon)

(E701)


121-155: Refine streaming logic for robust error handling.

While the iterative reading and decoding of streamed lines is appropriate, consider adding a recovery or retry mechanism in case the stream is interrupted. This could help preserve partial results or attempt reconnection before finalizing the tool’s output, especially for long-running or high-effort queries.

🧰 Tools
🪛 Ruff (0.8.2)

132-132: Multiple statements on one line (colon)

(E701)


156-214: Improve logging detail for debugging.

You handle exceptions by printing error messages to stderr and returning partial results. Consider expanding the debug logs to capture more details about the request and partial content for real-time troubleshooting. If security permits, you could log request IDs or short correlation tokens returned by Jina to help trace requests.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2d6a12 and acc7ce4.

📒 Files selected for processing (3)
  • ra_aid/__main__.py (1 hunks)
  • ra_aid/provider_strategy.py (1 hunks)
  • ra_aid/tools/web_search_jina.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
ra_aid/tools/web_search_jina.py (1)
ra_aid/provider_strategy.py (1)
  • create (400-422)
🪛 Ruff (0.8.2)
ra_aid/tools/web_search_jina.py

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)


107-107: Multiple statements on one line (colon)

(E701)


108-108: Multiple statements on one line (colon)

(E701)


109-109: Multiple statements on one line (colon)

(E701)


132-132: Multiple statements on one line (colon)

(E701)

🔇 Additional comments (4)
ra_aid/provider_strategy.py (2)

227-240: Improved OpenRouter API key validation with fallback mechanism.

The changes enhance the environment variable validation by properly separating expert and base keys into distinct variables, checking both simultaneously, and implementing a more robust fallback mechanism. The updated error message clearly communicates that both keys must be missing for the error to occur, and the comments provide helpful context about the fallback logic.


241-241: Clear code delineation with helpful comment.

The comment on line 241 clearly separates the expert validation logic from the non-expert validation logic, improving code readability.

ra_aid/__main__.py (2)

301-306: Confirm default model availability.

The default model was changed to "gemini-2.0-flash". Double-check that this model is recognized by the system and well-tested. If the model name is unsupported or experimental, clarify that in the help text.

Do you want me to open an issue to confirm that "gemini-2.0-flash" is valid in all scenarios?


297-297:

✅ Verification successful

Verify provider default.

You updated the default provider to "gemini". Ensure "gemini" is included in VALID_PROVIDERS and that any environment or dependencies needed for this provider are properly configured.


🏁 Script executed:

#!/bin/bash
# Verify that gemini is included in VALID_PROVIDERS
rg "VALID_PROVIDERS" -A 10

Length of output: 6542


Provider default "gemini" verified successfully

  • The VALID_PROVIDERS list in ra_aid/config.py includes "gemini".
  • All references (e.g., in ra_aid/__main__.py and other relevant modules) correctly use the updated default.
  • There are no outstanding dependency or configuration concerns related to this provider.

@Onnson
Copy link
Author
Onnson commented Apr 1, 2025

honestly that's probably right. i actually like Tavily a lot, but i needed a deep research implementation and it was quicker to replace. maybe i'll add tavily back and call it "web-search" or "quick-search"?

@Onnson Onnson force-pushed the feat/jina-deepsearch-integration branch from acc7ce4 to 9f7024d Compare April 1, 2025 22:47
Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
ra_aid/tools/web_search_jina.py (2)

2-2: Remove unused Union import.
It appears Union is never referenced, so it can be removed to keep imports concise.

- from typing import Dict, Optional, List, Union
+ from typing import Dict, Optional, List
🧰 Tools
🪛 Ruff (0.8.2)

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)


107-110: Refactor domain filter checks for clarity.
PEP 8 recommends placing statements on a new line instead of inline with the if. This minor adjustment makes code more readable.

- if good_domains: data["good_domains"] = good_domains
+ if good_domains:
+     data["good_domains"] = good_domains
- if bad_domains: data["bad_domains"] = bad_domains
+ if bad_domains:
+     data["bad_domains"] = bad_domains
- if only_domains: data["only_domains"] = only_domains
+ if only_domains:
+     data["only_domains"] = only_domains
🧰 Tools
🪛 Ruff (0.8.2)

107-107: Multiple statements on one line (colon)

(E701)


108-108: Multiple statements on one line (colon)

(E701)


109-109: Multiple statements on one line (colon)

(E701)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between acc7ce4 and 9f7024d.

📒 Files selected for processing (3)
  • ra_aid/__main__.py (1 hunks)
  • ra_aid/provider_strategy.py (1 hunks)
  • ra_aid/tools/web_search_jina.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • ra_aid/provider_strategy.py
  • ra_aid/main.py
🧰 Additional context used
🪛 Ruff (0.8.2)
ra_aid/tools/web_search_jina.py

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)


107-107: Multiple statements on one line (colon)

(E701)


108-108: Multiple statements on one line (colon)

(E701)


109-109: Multiple statements on one line (colon)

(E701)


132-132: Multiple statements on one line (colon)

(E701)

🔇 Additional comments (4)
ra_aid/tools/web_search_jina.py (4)

19-31: LGTM!
The dedicated JinaDeepSearchClient class fosters clearer separation of concerns and ensures robust environment handling.


33-106: Kudos on the streaming approach.
Using live streaming output for the search results improves user experience by providing real-time feedback.


132-132: False-positive from static analysis.
The slicing syntax here does not constitute multiple statements on one line.

🧰 Tools
🪛 Ruff (0.8.2)

132-132: Multiple statements on one line (colon)

(E701)


111-201: Robust exception handling and logging.
Comprehensive error handling updates the trajectory and avoids crashes, ensuring stable system behavior under failure conditions.

🧰 Tools
🪛 Ruff (0.8.2)

132-132: Multiple statements on one line (colon)

(E701)

@ariel-frischer
Copy link
Collaborator

honestly that's probably right. i actually like Tavily a lot, but i needed a deep research implementation and it was quicker to replace. maybe i'll add tavily back and call it "web-search" or "quick-search"?

Yea, thats fine or "quick-web-search" to be more specific or "basic-web-search". Also I think we should not introduce poetry dependency we are mainly using uv. Will mark on pyproject.toml.

- Implemented Gemini Pro 1.5 Flash as the default free LLM.

- Added initial documentation for multi-agent JSON communication protocol.

- Re-integrated Tavily for quick web searches.

- Created a select_web_search router tool to choose between Tavily (quick) and Jina (deep) based on prompt analysis.

- Updated environment validation and agent prompts to support the search router.
@Onnson
Copy link
Author
Onnson commented Apr 2, 2025

@ariel-frischer Done!

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (13)
ra_aid/tools/quick_web_search.py (1)

51-52: Check for missing API key environment variable

The code doesn't verify if the TAVILY_API_KEY environment variable exists before attempting to use it, which could lead to cryptic KeyError exceptions.

 try:
-    client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
+    tavily_api_key = os.environ.get("TAVILY_API_KEY")
+    if not tavily_api_key:
+        raise ValueError("TAVILY_API_KEY environment variable is not set")
+    client = TavilyClient(api_key=tavily_api_key)
ra_aid/tool_configs.py (1)

27-29: Unused direct imports

The imports for web_search_jina and quick_web_search are not directly used in this file as they're accessed through select_web_search. Consider removing them for cleaner code.

 from ra_aid.tools import (
     # ... other imports
-    web_search_jina,
-    quick_web_search,
     select_web_search,
 )
🧰 Tools
🪛 Ruff (0.8.2)

27-27: ra_aid.tools.web_search_jina imported but unused

Remove unused import

(F401)


28-28: ra_aid.tools.quick_web_search imported but unused

Remove unused import

(F401)

ra_aid/tools/select_web_search.py (1)

34-43: Consider standardizing error return formats

Currently, the quick search path returns a JSON-stringified error object, while the deep search path returns a plain dictionary with an error key. This inconsistency could make error handling more difficult for consuming code.

 except Exception as e:
     error_msg = f"Error calling quick_web_search internally: {e}"
     console.print(Panel(error_msg, title="Router Error", border_style="red"))
-    # Format error as a JSON string to somewhat match expected string return type
-    return json.dumps({"error": error_msg}) 
+    # Return error as a string to match expected return type
+    return f"Error performing quick web search: {error_msg}"
ra_aid/tools/web_search_jina.py (5)

2-2: Remove unused import Union.

The import typing.Union is never referenced in this file. Removing it will align with linting guidelines and keep the import list tidy.

-from typing import Dict, Optional, List, Union
+from typing import Dict, Optional, List
🧰 Tools
🪛 Ruff (0.8.2)

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)


111-113: Separate conditional statements onto new lines.

Multiple statements on a single line do not comply with PEP8 and can reduce readability. Consider placing the assignment statement on its own line:

-if good_domains: data["good_domains"] = good_domains
-if bad_domains: data["bad_domains"] = bad_domains
-if only_domains: data["only_domains"] = only_domains
+if good_domains:
+    data["good_domains"] = good_domains
+if bad_domains:
+    data["bad_domains"] = bad_domains
+if only_domains:
+    data["only_domains"] = only_domains
🧰 Tools
🪛 Ruff (0.8.2)

111-111: Multiple statements on one line (colon)

(E701)


112-112: Multiple statements on one line (colon)

(E701)


113-113: Multiple statements on one line (colon)

(E701)


136-136: Separate the inline conditional on line 136 for clarity.

Keeping the continue on a new line ensures consistency with PEP8 and increases code clarity:

-if json_str == "[DONE]": continue
+if json_str == "[DONE]":
+    continue
🧰 Tools
🪛 Ruff (0.8.2)

136-136: Multiple statements on one line (colon)

(E701)


82-82: Consolidate or remove the duplicate Console() instantiation.

You redefine console = Console() at lines 17 and 82. Consider reusing the same console object or removing the extra instantiation to avoid confusion.

Also applies to: 17-17


33-227: Function length and complexity.

web_search_jina is a large function handling multiple responsibilities: environment validation, HTTP streaming, display logic, and trajectory logging. Consider splitting these into smaller helper functions or methods in a class. This can improve readability, reusability, and maintainability.

🧰 Tools
🪛 Ruff (0.8.2)

111-111: Multiple statements on one line (colon)

(E701)


112-112: Multiple statements on one line (colon)

(E701)


113-113: Multiple statements on one line (colon)

(E701)


136-136: Multiple statements on one line (colon)

(E701)

ra_aid/env.py (2)

181-181: Remove unused variable at_least_one_web_key.

The local variable at_least_one_web_key is never used. Consider removing the assignment to keep the code clean and to avoid confusion.

-        elif var == "JINA_API_KEY" or var == "TAVILY_API_KEY":
-            at_least_one_web_key = True
🧰 Tools
🪛 Ruff (0.8.2)

181-181: Local variable at_least_one_web_key is assigned to but never used

Remove assignment to unused variable at_least_one_web_key

(F841)


126-140: Improve clarity of partially set keys messages.

Currently, check_web_research_env appends messages like “deep search unavailable” or “quick search unavailable” if only one key is missing. This is helpful information, but consider clarifying the message to convey exactly which functionalities are impacted. This can help direct users to fix missing keys as needed.

ra_aid/tools/__init__.py (3)

46-79: Consider grouping web search tools into a dedicated section.

In AVAILABLE_TOOLS, consider grouping web_search_jina, quick_web_search, and select_web_search under a “Web Search” comment to help future contributors quickly locate and modify them.


121-139: Validate the coverage of ToolType categories.

The ToolType enum seems comprehensive. Ensure new or specialized tool functions—like those for handling environment validation or custom provider logic—are categorized consistently, or else consider a fallback type (e.g., OTHER or UTILITY).


161-273: Check docstring parsing logic for potential corner cases.

The extract_tool_metadata function is fairly robust, but consider validating more edge cases (e.g., missing colon in argument lines, multiline argument descriptions). These edge cases might cause partial or incorrect metadata extraction.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7024d and a8c21c2.

📒 Files selected for processing (7)
  • ra_aid/env.py (4 hunks)
  • ra_aid/prompts/chat_prompts.py (1 hunks)
  • ra_aid/tool_configs.py (6 hunks)
  • ra_aid/tools/__init__.py (2 hunks)
  • ra_aid/tools/quick_web_search.py (1 hunks)
  • ra_aid/tools/select_web_search.py (1 hunks)
  • ra_aid/tools/web_search_jina.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (5)
ra_aid/tools/select_web_search.py (2)
ra_aid/tools/quick_web_search.py (1)
  • quick_web_search (18-90)
ra_aid/tools/web_search_jina.py (1)
  • web_search_jina (34-227)
ra_aid/tools/__init__.py (6)
ra_aid/tools/expert.py (1)
  • ask_expert (159-323)
ra_aid/tools/research.py (3)
  • existing_project_detected (14-48)
  • monorepo_detected (52-89)
  • ui_detected (93-126)
ra_aid/tools/web_search_jina.py (1)
  • web_search_jina (34-227)
ra_aid/tools/quick_web_search.py (1)
  • quick_web_search (18-90)
ra_aid/tools/select_web_search.py (1)
  • select_web_search (15-57)
ra_aid/tools/memory.py (6)
  • emit_key_facts (116-186)
  • emit_key_snippet (192-295)
  • emit_related_files (408-540)
  • emit_research_notes (47-112)
  • plan_implementation_completed (365-394)
  • task_completed (333-361)
ra_aid/env.py (1)
ra_aid/provider_strategy.py (2)
  • ProviderFactory (396-422)
  • ValidationResult (11-15)
ra_aid/tool_configs.py (3)
ra_aid/tools/web_search_jina.py (1)
  • web_search_jina (34-227)
ra_aid/tools/quick_web_search.py (1)
  • quick_web_search (18-90)
ra_aid/tools/select_web_search.py (1)
  • select_web_search (15-57)
ra_aid/tools/web_search_jina.py (1)
ra_aid/provider_strategy.py (1)
  • create (400-422)
🪛 Ruff (0.8.2)
ra_aid/env.py

181-181: Local variable at_least_one_web_key is assigned to but never used

Remove assignment to unused variable at_least_one_web_key

(F841)

ra_aid/tool_configs.py

27-27: ra_aid.tools.web_search_jina imported but unused

Remove unused import

(F401)


28-28: ra_aid.tools.quick_web_search imported but unused

Remove unused import

(F401)

ra_aid/tools/web_search_jina.py

2-2: typing.Union imported but unused

Remove unused import: typing.Union

(F401)


111-111: Multiple statements on one line (colon)

(E701)


112-112: Multiple statements on one line (colon)

(E701)


113-113: Multiple statements on one line (colon)

(E701)


136-136: Multiple statements on one line (colon)

(E701)

🔇 Additional comments (5)
ra_aid/prompts/chat_prompts.py (1)

50-58: Clear and effective web search handling guidelines added

The newly added web search handling instructions provide well-structured guidelines that help the agent make appropriate decisions about search types and result handling. The immediate stop condition is particularly valuable for ensuring users receive answers without unnecessary delays.

ra_aid/tools/quick_web_search.py (1)

1-90: Implementation maintains Tavily for quick searches while utilizing Jina for deep searches

This new file implements a quick search tool using Tavily, which aligns with the PR comments where maintaining Tavily as an option was discussed. The implementation is solid, with appropriate error handling and console feedback.

Note that the PR description mentioned replacing Tavily with Jina, but this hybrid approach (using Tavily for quick searches, Jina for deep searches) actually provides better flexibility as suggested in the PR comments.

ra_aid/tool_configs.py (1)

198-199: Consistent integration of select_web_search across all tool collections

The select_web_search tool has been properly integrated into all relevant tool collections, ensuring consistent availability across different modes of operation. This approach maintains backward compatibility while providing the new functionality.

Also applies to: 276-277, 314-315, 326-333, 355-356

ra_aid/tools/select_web_search.py (1)

1-57: Well-designed router for web search functionality

The implementation effectively routes search requests to the appropriate tool based on search type, with good error handling and informative console output. The default to 'deep' search aligns with providing comprehensive results when specificity is not indicated.

ra_aid/env.py (1)

20-44: Validate approach of reporting missing variables only if both keys are invalid.

In validate_web_research, you only populate missing_vars if no valid key is present (any_valid=False). This might hide partial misconfigurations when only one key is missing. If your intent is to require only one functional key, this is correct. Otherwise, you may want to always list any missing keys.

Would you like to adjust the logic to always include missing keys (even if the other is valid) or keep it as-is?

@Onnson Onnson changed the title Replace Tavily with Jina DeepSearch for better web research Web Search Quick/Deep router with Tavily/Jina DeepSearch Apr 2, 2025
@ai-christianson
Copy link
Owner

honestly that's probably right. i actually like Tavily a lot, but i needed a deep research implementation and it was quicker to replace. maybe i'll add tavily back and call it "web-search" or "quick-search"?

Agreed we should keep Tavily in there (especially since current users are depending on it) and make Jina an additional search backend.

@Onnson
Copy link
Author
Onnson commented Apr 2, 2025

@ai-christianson i did it in a8c21c2

if (os.getenv("OPENAI_API_KEY") and not os.getenv("ANTHROPIC_API_KEY"))
else "anthropic"
),
default="gemini",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing this? Seems out of scope of the PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, some things i did to make it work with my config were added to the pr when they shouldn't have. will split the prs and remove unnecessary changes.

parser.add_argument(
"--model",
type=str,
default="gemini-2.0-flash",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. Seems out of scope of PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be removed in the next pr split

return web_research_missing


def check_env() -> Tuple[bool, List[str], bool, List[str]]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're changing code related to vars other than just the jina/tavily env vars?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was part of the debug needed to make deepsearch work

from ra_aid.prompts.multi_agent_prompts import (
MULTI_AGENT_REQUEST_SCHEMA,
MULTI_AGENT_QUERY_HANDLER_PROMPT,
MULTI_AGENT_IMPLEMENTATION_SCHEMA,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intended to be included in this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, will be split into a different pr

@@ -47,6 +47,15 @@
- After receiving the user's initial input, use the given tools to fulfill their request.
- If you are uncertain about the user's requirements, run ask_human to clarify.
- If any logic or debugging checks are needed, consult the expert (if available) to get deeper analysis.
- **Web Search Handling:** When a web search is needed:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web search prompts should be in web_research_prompts.py and only added conditionally (if web research is enabled,) otherwise we're hurting agent performance by making the prompt longer and more complex than needed when those tools are not available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will fix

@@ -0,0 +1,203 @@
"""
Multi-Agent Communication Protocol Prompts
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this multi agent system?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it exports a json file in a specific schema to easily collaborate between different agent systems working on the same codebase, will be split into a different pr

* Organize information logically with appropriate headings, paragraphs, and formatting
* Provide comprehensive answers without unnecessary commentary about your capabilities
* Balance depth with brevity—be thorough but efficient
You leverage Jina DeepSearch's advanced capabilities:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These jina-specific prompts should only be used if we're actually using the jina search backend, so I would extract those to their own variables then conditionally include them when agent prompts are constructed/rendered.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will fix

if expert_enabled:
tools.append(emit_expert_context)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing these lines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the deepsearch research output can be very long, so we needed to stream it as it comes in, this is specified in: https://jina.ai/deepsearch -

Streaming

Delivers events as they occur through server-sent events, including reasoning steps and final answers. We strongly recommend keeping this option enabled since DeepSearch requests can take significant time to complete. Disabling streaming may result in '524 timeout' errors.

# Format error as a JSON string to somewhat match expected string return type
return json.dumps({"error": error_msg})
else: # Default to 'deep'
console.print(Panel("Routing to: web_search_jina (Jina) based on search_type='deep'", style="bright_blue"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional routing based on search backend only makes sense if both search backends are enabled/available. How do we handle the case where only tavily or only jina is enabled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's fallback logic in place that uses what it can find

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this fail if only tavily is enabled and the agent specifies anything other than quick?

@ai-christianson
Copy link
Owner

Hey @Onnson was wondering if you needed me to look at anything on this one. It is a nice feature that would be good to get merged in.

@Onnson
Copy link
Author
Onnson commented Apr 9, 2025

@ai-christianson i want to dive in it wit my full attention, last week i just stumbled on this super cool repo and started messing around with it, was able to wire a good implementation of this pretty quickly so i wanted to contribute. hopefully by next week i'll be done with a different project i'm working on and can put my full attention into this, there are a quite a bit more of these nice features i want to try and add here, specifically planning in a sequential/concurrent framework to allow it to spawn multiple agents for tasks that can be completed concurrently. this would probably require some individual git worktree logic for every spawned agent so they dont run over each other's edits.

gonna be pretty cool, it's worth the wait

@mpr1255
Copy link
mpr1255 commented Apr 30, 2025

@ai-christianson i want to dive in it wit my full attention, last week i just stumbled on this super cool repo and started messing around with it, was able to wire a good implementation of this pretty quickly so i wanted to contribute. hopefully by next week i'll be done with a different project i'm working on and can put my full attention into this, there are a quite a bit more of these nice features i want to try and add here, specifically planning in a sequential/concurrent framework to allow it to spawn multiple agents for tasks that can be completed concurrently. this would probably require some individual git worktree logic for every spawned agent so they dont run over each other's edits.

gonna be pretty cool, it's worth the wait

Hey it would definitely be cool to have the option of a jina search backend as well as tavily -- please come back to this if you've got the bandwidth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0