-
Notifications
You must be signed in to change notification settings - Fork 196
Web Search Quick/Deep router with Tavily/Jina DeepSearch #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Web Search Quick/Deep router with Tavily/Jina DeepSearch #177
Conversation
WalkthroughThis update introduces several enhancements across configuration, core functionality, tools, and tests. The changes include updating the Git ignore and dependency configurations, adding a new multi-agent prompt framework, and revamping environment variable validation to use the Jina API key. The web search tool has been transitioned from Tavily to Jina with the addition of a dedicated Jina search client. Tests have been updated accordingly to reflect these modifications, ensuring that environment validations and tool configurations align with the new implementations. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant WS as web_search_jina()
participant JC as JinaDeepSearchClient
participant API as Jina DeepSearch API
U->>WS: Call web_search_jina(query, parameters)
WS->>JC: Initialize client (with API key)
JC->>API: Send search request
API-->>JC: Return search results
JC-->>WS: Process response
WS-->>U: Return search result data
sequenceDiagram
participant App as Application
participant Env as check_env()
participant Validator as validate_web_research()
App->>Env: Trigger environment check
Env->>Validator: Validate presence of JINA_API_KEY
Validator-->>Env: Return validation result (valid/missing info)
Env-->>App: Provide overall environment status
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (2)
tests/ra_aid/test_tool_configs.py (2)
76-84
:⚠️ Potential issueTest assertion needs to be updated.
The test is still expecting
web_search_tavily
in the expected_names list, but the implementation now usesweb_search_jina
.expected_names = [ "emit_expert_context", "ask_expert", - "web_search_tavily", + "web_search_jina", "emit_research_notes", "task_completed", ]
92-95
:⚠️ Potential issueUpdate test assertion to match new implementation.
The assertion still checks for
web_search_tavily
when verifying tool names without expert enabled, but the implementation has changed to useweb_search_jina
.assert sorted(tool_names_no_expert) == sorted( - ["web_search_tavily", "emit_research_notes", "task_completed"] + ["web_search_jina", "emit_research_notes", "task_completed"] )
🧹 Nitpick comments (4)
ra_aid/env.py (1)
113-120
: Minor duplication withvalidate_web_research
.
The logic repeats steps already found invalidate_web_research
. Consider using that function under the hood to ensure consistency and reduce duplication.def check_web_research_env() -> List[str]: - web_research_missing = [] - key = "JINA_API_KEY" - if not os.environ.get(key): - web_research_missing.append(f"{key} environment variable is not set") - return web_research_missing + result = validate_web_research() + return result.missing_varsra_aid/tools/web_search_jina.py (1)
1-11
: Remove unused importUnion
.
TheUnion
import at line 2 is never utilized. Removing it helps maintain a clean codebase.-from typing import Dict, Optional, List, Union +from typing import Dict, Optional, List🧰 Tools
🪛 Ruff (0.8.2)
2-2:
typing.Union
imported but unusedRemove unused import:
typing.Union
(F401)
ra_aid/tools/__init__.py (2)
160-272
: Docstring parsing logic works but could be more robust.
Theextract_tool_metadata
function adequately extracts arguments and descriptions from structured docstrings, but it may be fragile if the format deviates (e.g., multiline argument annotations, advanced type annotations likeUnion[Type1, Type2]
, or custom docstring styles). Also, note that the nested “if” statements that settool_type
might override each other if the function name includes multiple matching keywords (for example, “code_complete”). Switching those matched conditions to a chain ofif...elif...elif
would clarify the final tool type.Example improvement for lines 236-239:
-if "code" in name_lower or "modification" in name_lower: - tool_type = ToolType.CODE_MODIFICATION -if "complete" in name_lower: - tool_type = ToolType.CODE_COMPLETION +if "code" in name_lower or "modification" in name_lower: + tool_type = ToolType.CODE_MODIFICATION +elif "complete" in name_lower: + tool_type = ToolType.CODE_COMPLETION
274-284
: Optional example usage block.
The commented-out example usage is helpful for quick testing or demonstration, but consider moving it to a dedicated example file or test suite to avoid clutter in production code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
.gitignore
(1 hunks)pyproject.toml
(1 hunks)ra_aid/env.py
(2 hunks)ra_aid/prompts/__init__.py
(2 hunks)ra_aid/prompts/multi_agent_prompts.py
(1 hunks)ra_aid/prompts/web_research_prompts.py
(5 hunks)ra_aid/tool_configs.py
(6 hunks)ra_aid/tools/__init__.py
(2 hunks)ra_aid/tools/web_search_jina.py
(1 hunks)ra_aid/tools/web_search_tavily.py
(0 hunks)tests/ra_aid/test_env.py
(13 hunks)tests/ra_aid/test_tool_configs.py
(1 hunks)tests/ra_aid/test_web_search_jina.py
(1 hunks)
💤 Files with no reviewable changes (1)
- ra_aid/tools/web_search_tavily.py
🧰 Additional context used
🧬 Code Definitions (3)
ra_aid/env.py (1)
ra_aid/provider_strategy.py (2)
ProviderFactory
(393-419)ValidationResult
(11-15)
ra_aid/tools/__init__.py (5)
ra_aid/tools/expert.py (1)
ask_expert
(159-323)ra_aid/tools/research.py (3)
existing_project_detected
(14-48)monorepo_detected
(52-89)ui_detected
(93-126)ra_aid/tools/shell.py (1)
run_shell_command
(40-148)ra_aid/tools/web_search_jina.py (1)
web_search_jina
(94-187)ra_aid/tools/memory.py (7)
deregister_related_files
(609-632)emit_key_facts
(116-186)emit_key_snippet
(192-295)emit_related_files
(408-540)emit_research_notes
(47-112)plan_implementation_completed
(365-394)task_completed
(333-361)
tests/ra_aid/test_env.py (1)
ra_aid/env.py (3)
validate_environment
(217-283)check_env
(122-157)check_web_research_env
(113-119)
🪛 Ruff (0.8.2)
ra_aid/tools/web_search_jina.py
2-2: typing.Union
imported but unused
Remove unused import: typing.Union
(F401)
🔇 Additional comments (35)
.gitignore (1)
23-25
: LGTM!Well-documented addition to ignore the agent memory bank directory.
ra_aid/prompts/__init__.py (2)
68-73
: LGTM!The multi-agent prompt imports are well-structured and follow the established pattern in this file.
123-126
: LGTM!Properly exposing the new multi-agent prompt constants in the
__all__
list.ra_aid/tool_configs.py (6)
27-27
: Import modification looks correct.The import change from
web_search_tavily
toweb_search_jina
aligns with the PR objective to replace Tavily with Jina DeepSearch for web research.
196-196
: Properly updated research tools.Correctly replaced
web_search_tavily
withweb_search_jina
in the research tools list.
274-274
: Planning tools updated correctly.Web search tool has been properly updated to use Jina DeepSearch in the planning tools configuration.
312-312
: Implementation tools correctly updated.Added
web_search_jina
to the implementation tools, maintaining functionality while migrating from Tavily to Jina DeepSearch.
335-335
: Web research tools initialization updated correctly.Changed the initialization of web research tools to use
web_search_jina
instead ofweb_search_tavily
.
365-365
: Chat tools properly updated.Successfully migrated the chat tools to use Jina DeepSearch instead of Tavily.
tests/ra_aid/test_env.py (6)
7-7
: Import updated to include new environment check functions.Correctly updated the imports to include the new
check_env
andcheck_web_research_env
functions.
34-34
: Environment variable updated correctly.Changed from checking for
TAVILY_API_KEY
toJINA_API_KEY
in the clean_env fixture.
57-57
: Tests updated to check for Jina API key.All assertions have been properly updated to check for
JINA_API_KEY
instead ofTAVILY_API_KEY
.Also applies to: 75-75, 99-99, 117-117, 128-128, 152-152, 165-165, 184-184, 195-195, 216-216, 239-239
244-260
: Added comprehensive test for web research environment check.This test thoroughly validates the
check_web_research_env
function, testing both when the API key is missing and when it's present. It also properly cleans up after the test.
262-311
: Well-structured test for comprehensive environment checking.This test effectively validates the
check_env
function across multiple scenarios:
- When no environment variables are set
- When only required variables are set
- When all variables are set
It also properly saves and restores the original environment state.
314-332
: Added test for web research validation with Jina.This test verifies that the web research functionality correctly validates the presence of the Jina API key. It tests both when the key is missing and when it's present.
ra_aid/prompts/web_research_prompts.py (4)
2-2
: Updated documentation to reference Jina DeepSearch.Correctly updated the module docstring to mention that web research is powered by Jina DeepSearch.
12-12
: Updated prompt sections to highlight DeepSearch capabilities.Each research-related prompt section has been updated to emphasize the specific capabilities of Jina DeepSearch, such as iterative reasoning, information validation, implementation detail verification, and providing up-to-date information.
Also applies to: 22-22, 32-32, 42-42
46-47
: Updated assistant description to mention Jina DeepSearch.The virtual assistant description now correctly references Jina DeepSearch as the tool used for finding, validating, and synthesizing information.
54-105
: Added comprehensive web research behavior guide.This substantial enhancement provides detailed guidance on using Jina DeepSearch's capabilities, structured across six key areas:
- Search Strategy
- Quality Control
- Response Generation
- Domain Expertise
- Research Triggers
- Output Format
This comprehensive guide will significantly improve the assistant's ability to leverage Jina DeepSearch for high-quality web research.
ra_aid/env.py (4)
5-6
: Clean import usage.
These imports—Any
,List
, andTuple
—are all referenced elsewhere in the file, so their inclusion is valid. No concerns to report here.
11-16
: Well-structured data class.
DefiningWebResearchValidationResult
as a dataclass helps improve readability and maintainability for environment validation results.
18-31
: Environment validation for JINA_API_KEY looks good.
This function clearly checks for the presence of theJINA_API_KEY
and cleanly returns a validation result, complying with the new Jina-based approach.
122-158
: Comprehensive environment checks for required and optional variables.
The function effectively centralizes environment checks, includingJINA_API_KEY
. No issues or performance concerns.ra_aid/prompts/multi_agent_prompts.py (4)
1-6
: Informative module docstring.
The high-level overview is clear and sets a good context for multi-agent communication.
8-59
: Robust JSON schema for requests.
TheMULTI_AGENT_REQUEST_SCHEMA
is detailed, covering all necessary requirements for multi-agent queries, including context, validity checks, and desired outputs.
60-162
: Comprehensive prompt documentation.
MULTI_AGENT_QUERY_HANDLER_PROMPT
thoroughly explains creation and processing of multi-agent queries, ensuring clarity and guiding consistent usage.
164-203
: Well-defined response schema.
MULTI_AGENT_IMPLEMENTATION_SCHEMA
neatly defines the structure for multi-agent responses, including error handling and partial completions.ra_aid/tools/web_search_jina.py (3)
17-24
: Environment-handling logic is clear.
The constructor ensures a suitable error is raised whenJINA_API_KEY
is missing, adhering to best practices.
26-91
: Efficient search method for streaming responses.
Thesearch
method correctly handles streaming and non-streaming conditions, raising HTTP errors as needed and returning parsed JSON.
93-187
: Tool function integrates well with logging and error handling.
Recording trajectory data before and after the search is a solid approach for traceability. Exception handling is consistent with existing patterns.ra_aid/tools/__init__.py (5)
1-7
: Nice module-level docstring and imports.
The docstring cleanly introduces the purpose of the module, and the imports use standard Python libraries (typing
,dataclasses
,enum
,inspect
) required for the new functionality. No issues found here.
8-45
: Imports of tool functions look consistent.
These imports consolidate the various tool functions into a single module. The commented notes on lines 14, 15, 17, and 44 about function names and file references are helpful for maintainability. Ensure that each function reference is up-to-date if you rename or relocate any tool in the future to avoid broken imports.
46-79
: Centralized tool registry for easy discovery and usage.
TheAVAILABLE_TOOLS
dictionary properly aggregates tool functions by name, increasing discoverability and consistency across the codebase. This approach is good for dynamic invocation and introspection of available tools. However, ensure that the dictionary remains in sync with your__all__
and docstrings, especially if function names are refactored or removed.
81-117
: Explicit export list ensures clarity.
Defining__all__
is helpful to control what symbols are publicly exported. This matches yourAVAILABLE_TOOLS
dictionary fairly closely, which is good. Watch out for duplication in the event of further refactors—if something changes inAVAILABLE_TOOLS
, remember to check__all__
as well.
120-158
: Effective setup of enumeration and dataclasses.
TheToolType
enum provides a clear categorization of tool functionalities, whileToolArgument
andToolMetadata
dataclasses succinctly capture tool details. This is a clean design that improves maintainability.
[tool.poetry.dependencies] | ||
python = "^3.9" | ||
requests = "^2.31.0" # For Jina DeepSearch API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Review Poetry dependency configuration with existing project setup
The new Poetry dependency section specifies Python "^3.9" while the project's main configuration requires ">=3.10" (line 11). This creates potential confusion about the minimum supported Python version. Additionally, "requests" is already listed as a dependency on line 54 (without version constraints), creating a redundant declaration.
Consider either:
- Removing the Poetry section if you're using Hatch as your build system
- Consolidating the dependencies to ensure version requirements are consistent
-[tool.poetry.dependencies]
-python = "^3.9"
-requests = "^2.31.0" # For Jina DeepSearch API
If you need to specify a version for the requests library, update line 54 instead:
- "requests",
+ "requests>=2.31.0", # For Jina DeepSearch API
Committable suggestion skipped: line range outside the PR's diff.
@@ -0,0 +1 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty test file needs implementation
This test file is empty but should contain tests for the new Jina DeepSearch functionality. Since this PR is replacing Tavily with Jina, comprehensive test coverage is essential to ensure the new implementation works correctly.
Please implement test cases for the Jina DeepSearch client including:
- Unit tests for successful API calls
- Error handling tests
- Mocked responses to avoid actual API calls during testing
- Verification that search results are properly processed
I can help generate test examples if needed.
EXPECTED_RESEARCH_TOOLS = { | ||
"web_search_jina", | ||
"emit_research_notes", | ||
"task_completed" | ||
} | ||
|
||
EXPECTED_PLANNING_TOOLS = { | ||
"web_search_jina", | ||
"create_plan", | ||
"task_completed" | ||
} | ||
|
||
EXPECTED_IMPLEMENTATION_TOOLS = { | ||
"web_search_jina", | ||
"emit_code_block", | ||
"task_completed" | ||
} | ||
|
||
EXPECTED_CHAT_TOOLS = { | ||
"web_search_jina", | ||
"task_completed" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Added constants for expected tools but they're not used.
These new constants define the expected tools for different categories, including web_search_jina
, but they aren't used in the existing tests. The tests still check for web_search_tavily
and will fail.
Consider updating the existing tests to use these new constants, or update the tests directly as suggested in the previous comments.
…with Jina DeepSearch for web research.\n- Create new web_search_jina tool and client.\n- Update prompts, tool configurations, environment variables, and tests related to web search.\n- Add requests dependency for Jina API.\n- Remove old web_search_tavily implementation and references.
ef376e1
to
b2d6a12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed missing dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
ra_aid/tools/__init__.py (2)
3-3
: Remove unusedUnion
import.According to the static analysis hints,
typing.Union
appears to be imported but not used. Please remove the unused import to keep the code clean and avoid confusion.-from typing import List, Dict, Any, Optional, Union +from typing import List, Dict, Any, Optional
14-14
: Clarify the in-line comment.The comment
# Assuming this is the intended function name, not read_file_tool
is redundant and mirrors the same name. Consider removing or updating it to provide clearer context.-from .read_file import read_file_tool # Assuming this is the intended function name, not read_file_tool +from .read_file import read_file_toolra_aid/tools/web_search_jina.py (2)
2-2
: Remove unusedUnion
import.The
Union
type fromtyping
is imported but never used. You can safely remove it to keep the imports concise.-from typing import Dict, Optional, List, Union +from typing import Dict, Optional, List🧰 Tools
🪛 Ruff (0.8.2)
2-2:
typing.Union
imported but unusedRemove unused import:
typing.Union
(F401)
79-90
: Consider handling partial line chunks in streaming.While the current approach to parsing streaming lines is workable, any incomplete JSON fragments split across lines would be ignored. If partial lines are common in Jina’s stream, consider buffering them to minimize potential data loss.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
.gitignore
(1 hunks)pyproject.toml
(1 hunks)ra_aid/env.py
(2 hunks)ra_aid/prompts/__init__.py
(2 hunks)ra_aid/prompts/multi_agent_prompts.py
(1 hunks)ra_aid/prompts/web_research_prompts.py
(5 hunks)ra_aid/tool_configs.py
(6 hunks)ra_aid/tools/__init__.py
(2 hunks)ra_aid/tools/web_search_jina.py
(1 hunks)ra_aid/tools/web_search_tavily.py
(0 hunks)tests/ra_aid/test_env.py
(13 hunks)tests/ra_aid/test_tool_configs.py
(1 hunks)tests/ra_aid/test_web_search_jina.py
(1 hunks)
💤 Files with no reviewable changes (1)
- ra_aid/tools/web_search_tavily.py
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/ra_aid/test_web_search_jina.py
- .gitignore
- ra_aid/prompts/init.py
- pyproject.toml
- tests/ra_aid/test_tool_configs.py
🧰 Additional context used
🧬 Code Definitions (3)
tests/ra_aid/test_env.py (1)
ra_aid/env.py (3)
validate_environment
(217-283)check_env
(122-157)check_web_research_env
(113-119)
ra_aid/env.py (1)
ra_aid/provider_strategy.py (2)
ProviderFactory
(393-419)ValidationResult
(11-15)
ra_aid/tools/__init__.py (5)
ra_aid/tools/expert.py (1)
ask_expert
(159-323)ra_aid/tools/research.py (4)
existing_project_detected
(14-48)monorepo_detected
(52-89)ui_detected
(93-126)mark_research_complete_no_implementation_required
(130-172)ra_aid/tools/shell.py (1)
run_shell_command
(40-148)ra_aid/tools/web_search_jina.py (1)
web_search_jina
(94-187)ra_aid/tools/memory.py (7)
deregister_related_files
(609-632)emit_key_facts
(116-186)emit_key_snippet
(192-295)emit_related_files
(408-540)emit_research_notes
(47-112)plan_implementation_completed
(365-394)task_completed
(333-361)
🪛 Ruff (0.8.2)
ra_aid/tools/web_search_jina.py
2-2: typing.Union
imported but unused
Remove unused import: typing.Union
(F401)
🔇 Additional comments (27)
ra_aid/prompts/multi_agent_prompts.py (5)
1-6
: Well-documented module purposeGood job including a clear docstring that explains the purpose of this module. The documentation effectively communicates that this module establishes a protocol for multi-agent communication between RA-Aid and other agent systems.
9-58
: Well-structured request schema definitionThe JSON schema for multi-agent communication is comprehensive and clearly defines the structure for requests. The schema properly handles various content types and includes necessary validation rules.
A few observations:
- Required fields are properly specified
- The nested structure for questions and context is well-organized
- The enum for output types provides good constraints for response formatting
This schema will help ensure consistency in the multi-agent communication protocol.
61-162
: Comprehensive prompt with clear guidelinesThe prompt is well-structured and provides detailed instructions for both creating and processing multi-agent requests. It covers all aspects of the communication flow including:
- Request creation with examples
- Processing guidelines
- Response generation based on output types
- Error handling procedures
- Quality checks
This approach ensures agents will have clear guidance on how to interact within this framework.
165-203
: Complete implementation schema for responsesThe response implementation schema properly defines the structure for agent responses with:
- Clear status indicators (complete, partial, error)
- Required response fields
- Structured error reporting
This schema will help ensure consistency in how agent responses are formatted and validated.
1-203
: Excellent multi-agent framework implementationThis new file establishes a robust framework for multi-agent communication that will support the transition from Tavily to Jina Deep 8000 Search. While not directly mentioning the search providers, this structured communication protocol will help manage complex interactions between agents, which is particularly valuable for web research tasks that require iterative reasoning and source validation (key benefits mentioned in the PR objectives).
The schemas and prompts are well-designed and should facilitate the enhanced control and fine-grained parameters that Jina DeepSearch offers compared to Tavily.
ra_aid/tool_configs.py (6)
27-27
: Tool import change from tavily to jina looks good.This change correctly updates the import statement to reflect the new web search provider.
196-196
: Consistent replacement of web search tool in RESEARCH_TOOLS.The reference in the RESEARCH_TOOLS list has been properly updated.
274-274
: Updated planning tools to use Jina search.The get_planning_tools function has been properly updated to use the new search implementation.
312-312
: Implementation tools now use Jina for web search.This change correctly updates the implementation tools to use the new search capability.
335-335
: Web research tools updated to use Jina.The get_web_research_tools function has been correctly modified to use Jina DeepSearch.
365-365
: Chat tools now use Jina for search functionality.The get_chat_tools function has been correctly updated to use the new search implementation.
tests/ra_aid/test_env.py (6)
7-7
: Import statement updated to include new environment check functions.The import has been properly updated to include the new functions required for testing.
34-34
: Updated clean_env fixture to use JINA_API_KEY.The test environment setup has been properly modified to use Jina instead of Tavily.
57-57
: All test assertions updated to check for JINA_API_KEY.All test assertions have been consistently updated to check for the Jina API key instead of Tavily.
Also applies to: 75-75, 99-99, 117-117, 128-128, 152-152, 165-165, 184-184, 195-195, 216-216, 239-239
244-260
: Good test coverage for check_web_research_env function.This new test thoroughly verifies the behavior of the check_web_research_env function with both missing and present API keys.
262-312
: Comprehensive test for check_env function.This test covers all scenarios for the check_env function:
- No environment variables set
- Only required variables set
- All variables set
The test also properly handles setup and cleanup of environment variables.
314-333
: Good validation test for Jina web research environment.This test thoroughly verifies the validation logic for the Jina API key in the web research context.
ra_aid/env.py (5)
5-6
: Updated imports for new data class and typing.The import statements have been properly updated to support the new functionality.
11-15
: Well-structured data class for web research validation.The WebResearchValidationResult data class follows the pattern of existing ValidationResult class and is well-implemented.
18-31
: Clean implementation of validate_web_research function.This function properly validates the Jina API key and returns a structured result. The implementation is consistent with other validation functions in the module.
113-119
: Clear implementation of check_web_research_env function.This utility function correctly checks for the presence of the Jina API key and returns appropriate messages.
122-157
: Comprehensive environment variable validation.The check_env function provides a centralized place for validating all environment variables, with clear separation between required and optional variables.
The function is well-documented and follows good practices by:
- Checking required variables first
- Handling optional variables separately
- Including web research variables
- Providing clear return values
ra_aid/prompts/web_research_prompts.py (3)
2-2
: Updated docstring to mention Jina DeepSearch.The docstring now correctly indicates that the prompts are powered by Jina DeepSearch.
12-12
: Updated prompt sections with DeepSearch capabilities.All web research prompt sections have been updated to reference specific DeepSearch capabilities:
- Research section highlights iterative reasoning
- Planning section emphasizes information validation
- Implementation section focuses on verification from trusted sources
- Chat section highlights up-to-date, verified information
These updates properly align with the capabilities of the new search provider.
Also applies to: 22-22, 32-32, 42-42
46-106
: Comprehensive update to web research prompt.The web research prompt has been significantly enhanced with detailed sections on Jina DeepSearch capabilities:
- Search Strategy - Highlights iterative reasoning and source validation
- Quality Control - Emphasizes verification from reliable sources
- Response Generation - Focuses on synthesizing information clearly
- Domain Expertise - Prioritizes official documentation and trusted sources
- Research Triggers - Defines scenarios for initiating web research
- Output Format - Specifies structure for research responses
This comprehensive update provides clear guidance for leveraging DeepSearch effectively and aligns with the PR objectives of improving search quality, control, and information validity.
ra_aid/tools/__init__.py (1)
8-112
: Looks good overall.The structure of
AVAILABLE_TOOLS
, the new enums/classes (ToolType
,ToolArgument
,ToolMetadata
), and theextract_tool_metadata
function appear well-implemented and maintainable. The docstring parsing logic is clean and straightforward.ra_aid/tools/web_search_jina.py (1)
93-187
: Implementation is solid.The
web_search_jina
tool integrates well with the new Jina DeepSearch client, logs searches and errors appropriately, and neatly returns structured results. No further issues are apparent.
Why remove Tavily completely? Can you refactor to use Jina instead of tavily with a flag like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
ra_aid/tools/web_search_jina.py (6)
2-2
: Remove unused import.You import
typing.Union
but never use it. Remove this unused import to align with best practices and avoid lint warnings.Apply this diff:
-from typing import Dict, Optional, List, Union +from typing import Dict, Optional, List🧰 Tools
🪛 Ruff (0.8.2)
2-2:
typing.Union
imported but unusedRemove unused import:
typing.Union
(F401)
79-91
: Centralize API key handling.You're re-checking
os.environ.get("JINA_API_KEY")
inside this function, even though theJinaDeepSearchClient
constructor does the same. Consider consolidating the API key logic in a single place (e.g., the client) to reduce duplication and potential inconsistencies.- api_key = os.environ.get("JINA_API_KEY") - if not api_key: - ... - ... - headers = { 8000 - "Content-Type": "application/json", - "Authorization": f"Bearer {api_key}" - } + client = JinaDeepSearchClient() + api_key = client.api_key + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {client.api_key}" + }
107-109
: Comply with Python style guidelines.Multiple statements on one line can be less readable. Please place the block body on separate lines as recommended by PEP 8.
-if good_domains: data["good_domains"] = good_domains -if bad_domains: data["bad_domains"] = bad_domains -if only_domains: data["only_domains"] = only_domains +if good_domains: + data["good_domains"] = good_domains +if bad_domains: + data["bad_domains"] = bad_domains +if only_domains: + data["only_domains"] = only_domains🧰 Tools
🪛 Ruff (0.8.2)
107-107: Multiple statements on one line (colon)
(E701)
108-108: Multiple statements on one line (colon)
(E701)
109-109: Multiple statements on one line (colon)
(E701)
132-132
: Avoid single-line control flow statements.Same issue: prefer moving
continue
to a new line for clarity.-if json_str == "[DONE]": continue +if json_str == "[DONE]": + continue🧰 Tools
🪛 Ruff (0.8.2)
132-132: Multiple statements on one line (colon)
(E701)
121-155
: Refine streaming logic for robust error handling.While the iterative reading and decoding of streamed lines is appropriate, consider adding a recovery or retry mechanism in case the stream is interrupted. This could help preserve partial results or attempt reconnection before finalizing the tool’s output, especially for long-running or high-effort queries.
🧰 Tools
🪛 Ruff (0.8.2)
132-132: Multiple statements on one line (colon)
(E701)
156-214
: Improve logging detail for debugging.You handle exceptions by printing error messages to
stderr
and returning partial results. Consider expanding the debug logs to capture more details about the request and partial content for real-time troubleshooting. If security permits, you could log request IDs or short correlation tokens returned by Jina to help trace requests.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ra_aid/__main__.py
(1 hunks)ra_aid/provider_strategy.py
(1 hunks)ra_aid/tools/web_search_jina.py
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
ra_aid/tools/web_search_jina.py (1)
ra_aid/provider_strategy.py (1)
create
(400-422)
🪛 Ruff (0.8.2)
ra_aid/tools/web_search_jina.py
2-2: typing.Union
imported but unused
Remove unused import: typing.Union
(F401)
107-107: Multiple statements on one line (colon)
(E701)
108-108: Multiple statements on one line (colon)
(E701)
109-109: Multiple statements on one line (colon)
(E701)
132-132: Multiple statements on one line (colon)
(E701)
🔇 Additional comments (4)
ra_aid/provider_strategy.py (2)
227-240
: Improved OpenRouter API key validation with fallback mechanism.The changes enhance the environment variable validation by properly separating expert and base keys into distinct variables, checking both simultaneously, and implementing a more robust fallback mechanism. The updated error message clearly communicates that both keys must be missing for the error to occur, and the comments provide helpful context about the fallback logic.
241-241
: Clear code delineation with helpful comment.The comment on line 241 clearly separates the expert validation logic from the non-expert validation logic, improving code readability.
ra_aid/__main__.py (2)
301-306
: Confirm default model availability.The default model was changed to
"gemini-2.0-flash"
. Double-check that this model is recognized by the system and well-tested. If the model name is unsupported or experimental, clarify that in the help text.Do you want me to open an issue to confirm that
"gemini-2.0-flash"
is valid in all scenarios?
297-297
:✅ Verification successful
Verify provider default.
You updated the default provider to
"gemini"
. Ensure"gemini"
is included inVALID_PROVIDERS
and that any environment or dependencies needed for this provider are properly configured.
🏁 Script executed:
#!/bin/bash # Verify that gemini is included in VALID_PROVIDERS rg "VALID_PROVIDERS" -A 10Length of output: 6542
Provider default "gemini" verified successfully
- The
VALID_PROVIDERS
list inra_aid/config.py
includes"gemini"
.- All references (e.g., in
ra_aid/__main__.py
and other relevant modules) correctly use the updated default.- There are no outstanding dependency or configuration concerns related to this provider.
honestly that's probably right. i actually like Tavily a lot, but i needed a deep research implementation and it was quicker to replace. maybe i'll add tavily back and call it "web-search" or "quick-search"? |
acc7ce4
to
9f7024d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
ra_aid/tools/web_search_jina.py (2)
2-2
: Remove unusedUnion
import.
It appearsUnion
is never referenced, so it can be removed to keep imports concise.- from typing import Dict, Optional, List, Union + from typing import Dict, Optional, List🧰 Tools
🪛 Ruff (0.8.2)
2-2:
typing.Union
imported but unusedRemove unused import:
typing.Union
(F401)
107-110
: Refactor domain filter checks for clarity.
PEP 8 recommends placing statements on a new line instead of inline with theif
. This minor adjustment makes code more readable.- if good_domains: data["good_domains"] = good_domains + if good_domains: + data["good_domains"] = good_domains - if bad_domains: data["bad_domains"] = bad_domains + if bad_domains: + data["bad_domains"] = bad_domains - if only_domains: data["only_domains"] = only_domains + if only_domains: + data["only_domains"] = only_domains🧰 Tools
🪛 Ruff (0.8.2)
107-107: Multiple statements on one line (colon)
(E701)
108-108: Multiple statements on one line (colon)
(E701)
109-109: Multiple statements on one line (colon)
(E701)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ra_aid/__main__.py
(1 hunks)ra_aid/provider_strategy.py
(1 hunks)ra_aid/tools/web_search_jina.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- ra_aid/provider_strategy.py
- ra_aid/main.py
🧰 Additional context used
🪛 Ruff (0.8.2)
ra_aid/tools/web_search_jina.py
2-2: typing.Union
imported but unused
Remove unused import: typing.Union
(F401)
107-107: Multiple statements on one line (colon)
(E701)
108-108: Multiple statements on one line (colon)
(E701)
109-109: Multiple statements on one line (colon)
(E701)
132-132: Multiple statements on one line (colon)
(E701)
🔇 Additional comments (4)
ra_aid/tools/web_search_jina.py (4)
19-31
: LGTM!
The dedicatedJinaDeepSearchClient
class fosters clearer separation of concerns and ensures robust environment handling.
33-106
: Kudos on the streaming approach.
Using live streaming output for the search results improves user experience by providing real-time feedback.
132-132
: False-positive from static analysis.
The slicing syntax here does not constitute multiple statements on one line.🧰 Tools
🪛 Ruff (0.8.2)
132-132: Multiple statements on one line (colon)
(E701)
111-201
: Robust exception handling and logging.
Comprehensive error handling updates the trajectory and avoids crashes, ensuring stable system behavior under failure conditions.🧰 Tools
🪛 Ruff (0.8.2)
132-132: Multiple statements on one line (colon)
(E701)
Yea, thats fine or "quick-web-search" to be more specific or "basic-web-search". Also I think we should not introduce poetry dependency we are mainly using uv. Will mark on pyproject.toml. |
- Implemented Gemini Pro 1.5 Flash as the default free LLM. - Added initial documentation for multi-agent JSON communication protocol. - Re-integrated Tavily for quick web searches. - Created a select_web_search router tool to choose between Tavily (quick) and Jina (deep) based on prompt analysis. - Updated environment validation and agent prompts to support the search router.
@ariel-frischer Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (13)
ra_aid/tools/quick_web_search.py (1)
51-52
: Check for missing API key environment variableThe code doesn't verify if the
TAVILY_API_KEY
environment variable exists before attempting to use it, which could lead to cryptic KeyError exceptions.try: - client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"]) + tavily_api_key = os.environ.get("TAVILY_API_KEY") + if not tavily_api_key: + raise ValueError("TAVILY_API_KEY environment variable is not set") + client = TavilyClient(api_key=tavily_api_key)ra_aid/tool_configs.py (1)
27-29
: Unused direct importsThe imports for
web_search_jina
andquick_web_search
are not directly used in this file as they're accessed throughselect_web_search
. Consider removing them for cleaner code.from ra_aid.tools import ( # ... other imports - web_search_jina, - quick_web_search, select_web_search, )🧰 Tools
🪛 Ruff (0.8.2)
27-27:
ra_aid.tools.web_search_jina
imported but unusedRemove unused import
(F401)
28-28:
ra_aid.tools.quick_web_search
imported but unusedRemove unused import
(F401)
ra_aid/tools/select_web_search.py (1)
34-43
: Consider standardizing error return formatsCurrently, the quick search path returns a JSON-stringified error object, while the deep search path returns a plain dictionary with an error key. This inconsistency could make error handling more difficult for consuming code.
except Exception as e: error_msg = f"Error calling quick_web_search internally: {e}" console.print(Panel(error_msg, title="Router Error", border_style="red")) - # Format error as a JSON string to somewhat match expected string return type - return json.dumps({"error": error_msg}) + # Return error as a string to match expected return type + return f"Error performing quick web search: {error_msg}"ra_aid/tools/web_search_jina.py (5)
2-2
: Remove unused importUnion
.The import
typing.Union
is never referenced in this file. Removing it will align with linting guidelines and keep the import list tidy.-from typing import Dict, Optional, List, Union +from typing import Dict, Optional, List🧰 Tools
🪛 Ruff (0.8.2)
2-2:
typing.Union
imported but unusedRemove unused import:
typing.Union
(F401)
111-113
: Separate conditional statements onto new lines.Multiple statements on a single line do not comply with PEP8 and can reduce readability. Consider placing the assignment statement on its own line:
-if good_domains: data["good_domains"] = good_domains -if bad_domains: data["bad_domains"] = bad_domains -if only_domains: data["only_domains"] = only_domains +if good_domains: + data["good_domains"] = good_domains +if bad_domains: + data["bad_domains"] = bad_domains +if only_domains: + data["only_domains"] = only_domains🧰 Tools
🪛 Ruff (0.8.2)
111-111: Multiple statements on one line (colon)
(E701)
112-112: Multiple statements on one line (colon)
(E701)
113-113: Multiple statements on one line (colon)
(E701)
136-136
: Separate the inline conditional on line 136 for clarity.Keeping the
continue
on a new line ensures consistency with PEP8 and increases code clarity:-if json_str == "[DONE]": continue +if json_str == "[DONE]": + continue🧰 Tools
🪛 Ruff (0.8.2)
136-136: Multiple statements on one line (colon)
(E701)
82-82
: Consolidate or remove the duplicateConsole()
instantiation.You redefine
console = Console()
at lines 17 and 82. Consider reusing the sameconsole
object or removing the extra instantiation to avoid confusion.Also applies to: 17-17
33-227
: Function length and complexity.
web_search_jina
is a large function handling multiple responsibilities: environment validation, HTTP streaming, display logic, and trajectory logging. Consider splitting these into smaller helper functions or methods in a class. This can improve readability, reusability, and maintainability.🧰 Tools
🪛 Ruff (0.8.2)
111-111: Multiple statements on one line (colon)
(E701)
112-112: Multiple statements on one line (colon)
(E701)
113-113: Multiple statements on one line (colon)
(E701)
136-136: Multiple statements on one line (colon)
(E701)
ra_aid/env.py (2)
181-181
: Remove unused variableat_least_one_web_key
.The local variable
at_least_one_web_key
is never used. Consider removing the assignment to keep the code clean and to avoid confusion.- elif var == "JINA_API_KEY" or var == "TAVILY_API_KEY": - at_least_one_web_key = True🧰 Tools
🪛 Ruff (0.8.2)
181-181: Local variable
at_least_one_web_key
is assigned to but never usedRemove assignment to unused variable
at_least_one_web_key
(F841)
126-140
: Improve clarity of partially set keys messages.Currently,
check_web_research_env
appends messages like “deep search unavailable” or “quick search unavailable” if only one key is missing. This is helpful information, but consider clarifying the message to convey exactly which functionalities are impacted. This can help direct users to fix missing keys as needed.ra_aid/tools/__init__.py (3)
46-79
: Consider grouping web search tools into a dedicated section.In
AVAILABLE_TOOLS
, consider groupingweb_search_jina
,quick_web_search
, andselect_web_search
under a “Web Search” comment to help future contributors quickly locate and modify them.
121-139
: Validate the coverage ofToolType
categories.The
ToolType
enum seems comprehensive. Ensure new or specialized tool functions—like those for handling environment validation or custom provider logic—are categorized consistently, or else consider a fallback type (e.g.,OTHER
orUTILITY
).
161-273
: Check docstring parsing logic for potential corner cases.The
extract_tool_metadata
function is fairly robust, but consider validating more edge cases (e.g., missing colon in argument lines, multiline argument descriptions). These edge cases might cause partial or incorrect metadata extraction.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
ra_aid/env.py
(4 hunks)ra_aid/prompts/chat_prompts.py
(1 hunks)ra_aid/tool_configs.py
(6 hunks)ra_aid/tools/__init__.py
(2 hunks)ra_aid/tools/quick_web_search.py
(1 hunks)ra_aid/tools/select_web_search.py
(1 hunks)ra_aid/tools/web_search_jina.py
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (5)
ra_aid/tools/select_web_search.py (2)
ra_aid/tools/quick_web_search.py (1)
quick_web_search
(18-90)ra_aid/tools/web_search_jina.py (1)
web_search_jina
(34-227)
ra_aid/tools/__init__.py (6)
ra_aid/tools/expert.py (1)
ask_expert
(159-323)ra_aid/tools/research.py (3)
existing_project_detected
(14-48)monorepo_detected
(52-89)ui_detected
(93-126)ra_aid/tools/web_search_jina.py (1)
web_search_jina
(34-227)ra_aid/tools/quick_web_search.py (1)
quick_web_search
(18-90)ra_aid/tools/select_web_search.py (1)
select_web_search
(15-57)ra_aid/tools/memory.py (6)
emit_key_facts
(116-186)emit_key_snippet
(192-295)emit_related_files
(408-540)emit_research_notes
(47-112)plan_implementation_completed
(365-394)task_completed
(333-361)
ra_aid/env.py (1)
ra_aid/provider_strategy.py (2)
ProviderFactory
(396-422)ValidationResult
(11-15)
ra_aid/tool_configs.py (3)
ra_aid/tools/web_search_jina.py (1)
web_search_jina
(34-227)ra_aid/tools/quick_web_search.py (1)
quick_web_search
(18-90)ra_aid/tools/select_web_search.py (1)
select_web_search
(15-57)
ra_aid/tools/web_search_jina.py (1)
ra_aid/provider_strategy.py (1)
create
(400-422)
🪛 Ruff (0.8.2)
ra_aid/env.py
181-181: Local variable at_least_one_web_key
is assigned to but never used
Remove assignment to unused variable at_least_one_web_key
(F841)
ra_aid/tool_configs.py
27-27: ra_aid.tools.web_search_jina
imported but unused
Remove unused import
(F401)
28-28: ra_aid.tools.quick_web_search
imported but unused
Remove unused import
(F401)
ra_aid/tools/web_search_jina.py
2-2: typing.Union
imported but unused
Remove unused import: typing.Union
(F401)
111-111: Multiple statements on one line (colon)
(E701)
112-112: Multiple statements on one line (colon)
(E701)
113-113: Multiple statements on one line (colon)
(E701)
136-136: Multiple statements on one line (colon)
(E701)
🔇 Additional comments (5)
ra_aid/prompts/chat_prompts.py (1)
50-58
: Clear and effective web search handling guidelines addedThe newly added web search handling instructions provide well-structured guidelines that help the agent make appropriate decisions about search types and result handling. The immediate stop condition is particularly valuable for ensuring users receive answers without unnecessary delays.
ra_aid/tools/quick_web_search.py (1)
1-90
: Implementation maintains Tavily for quick searches while utilizing Jina for deep searchesThis new file implements a quick search tool using Tavily, which aligns with the PR comments where maintaining Tavily as an option was discussed. The implementation is solid, with appropriate error handling and console feedback.
Note that the PR description mentioned replacing Tavily with Jina, but this hybrid approach (using Tavily for quick searches, Jina for deep searches) actually provides better flexibility as suggested in the PR comments.
ra_aid/tool_configs.py (1)
198-199
: Consistent integration of select_web_search across all tool collectionsThe
select_web_search
tool has been properly integrated into all relevant tool collections, ensuring consistent availability across different modes of operation. This approach maintains backward compatibility while providing the new functionality.Also applies to: 276-277, 314-315, 326-333, 355-356
ra_aid/tools/select_web_search.py (1)
1-57
: Well-designed router for web search functionalityThe implementation effectively routes search requests to the appropriate tool based on search type, with good error handling and informative console output. The default to 'deep' search aligns with providing comprehensive results when specificity is not indicated.
ra_aid/env.py (1)
20-44
: Validate approach of reporting missing variables only if both keys are invalid.In
validate_web_research
, you only populatemissing_vars
if no valid key is present (any_valid=False
). This might hide partial misconfigurations when only one key is missing. If your intent is to require only one functional key, this is correct. Otherwise, you may want to always list any missing keys.Would you like to adjust the logic to always include missing keys (even if the other is valid) or keep it as-is?
Agreed we should keep Tavily in there (especially since current users are depending on it) and make Jina an additional search backend. |
@ai-christianson i did it in a8c21c2 |
if (os.getenv("OPENAI_API_KEY") and not os.getenv("ANTHROPIC_API_KEY")) | ||
else "anthropic" | ||
), | ||
default="gemini", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing this? Seems out of scope of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, some things i did to make it work with my config were added to the pr when they shouldn't have. will split the prs and remove unnecessary changes.
parser.add_argument( | ||
"--model", | ||
type=str, | ||
default="gemini-2.0-flash", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. Seems out of scope of PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be removed in the next pr split
return web_research_missing | ||
|
||
|
||
def check_env() -> Tuple[bool, List[str], bool, List[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're changing code related to vars other than just the jina/tavily env vars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was part of the debug needed to make deepsearch work
from ra_aid.prompts.multi_agent_prompts import ( | ||
MULTI_AGENT_REQUEST_SCHEMA, | ||
MULTI_AGENT_QUERY_HANDLER_PROMPT, | ||
MULTI_AGENT_IMPLEMENTATION_SCHEMA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this intended to be included in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, will be split into a different pr
@@ -47,6 +47,15 @@ | |||
- After receiving the user's initial input, use the given tools to fulfill their request. | |||
- If you are uncertain about the user's requirements, run ask_human to clarify. | |||
- If any logic or debugging checks are needed, consult the expert (if available) to get deeper analysis. | |||
- **Web Search Handling:** When a web search is needed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The web search prompts should be in web_research_prompts.py
and only added conditionally (if web research is enabled,) otherwise we're hurting agent performance by making the prompt longer and more complex than needed when those tools are not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, will fix
@@ -0,0 +1,203 @@ | |||
""" | |||
Multi-Agent Communication Protocol Prompts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this multi agent system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it exports a json file in a specific schema to easily collaborate between different agent systems working on the same codebase, will be split into a different pr
* Organize information logically with appropriate headings, paragraphs, and formatting | ||
* Provide comprehensive answers without unnecessary commentary about your capabilities | ||
* Balance depth with brevity—be thorough but efficient | ||
You leverage Jina DeepSearch's advanced capabilities: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These jina-specific prompts should only be used if we're actually using the jina search backend, so I would extract those to their own variables then conditionally include them when agent prompts are constructed/rendered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, will fix
if expert_enabled: | ||
tools.append(emit_expert_context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the deepsearch research output can be very long, so we needed to stream it as it comes in, this is specified in: https://jina.ai/deepsearch -
Streaming
Delivers events as they occur through server-sent events, including reasoning steps and final answers. We strongly recommend keeping this option enabled since DeepSearch requests can take significant time to complete. Disabling streaming may result in '524 timeout' errors.
# Format error as a JSON string to somewhat match expected string return type | ||
return json.dumps({"error": error_msg}) | ||
else: # Default to 'deep' | ||
console.print(Panel("Routing to: web_search_jina (Jina) based on search_type='deep'", style="bright_blue")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional routing based on search backend only makes sense if both search backends are enabled/available. How do we handle the case where only tavily or only jina is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's fallback logic in place that uses what it can find
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this fail if only tavily is enabled and the agent specifies anything other than quick
?
Hey @Onnson was wondering if you needed me to look at anything on this one. It is a nice feature that would be good to get merged in. |
@ai-christianson i want to dive in it wit my full attention, last week i just stumbled on this super cool repo and started messing around with it, was able to wire a good implementation of this pretty quickly so i wanted to contribute. hopefully by next week i'll be done with a different project i'm working on and can put my full attention into this, there are a quite a bit more of these nice features i want to try and add here, specifically planning in a sequential/concurrent framework to allow it to spawn multiple agents for tasks that can be completed concurrently. this would probably require some individual git worktree logic for every spawned agent so they dont run over each other's edits. gonna be pretty cool, it's worth the wait |
Hey it would definitely be cool to have the option of a jina search backend as well as tavily -- please come back to this if you've got the bandwidth. |
Added Jina DeepSearch for Deep Web Research
Summary
This PR adds a search router to choose between the Tavily-based web search implementation and Jina DeepSearch to improve the quality, reliability, and control of web research performed by RA-Aid.
Motivation & Benefits
Summary by CodeRabbit
New Features
Chores
.gitignore
to exclude specific directories from version control.