8000 SWE-smith & multimodal base support by klieret · Pull Request #1092 · SWE-agent/SWE-agent · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

SWE-smith & multimodal base support #1092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 1, 2025
Merged

SWE-smith & multimodal base support #1092

merged 3 commits into from
May 1, 2025

Conversation

klieret
Copy link
Member
@klieret klieret commented May 1, 2025
  • Init SWE-smith changes
  • Remove -x from git clean in reset
  • Add multimodal support

@klieret klieret changed the title swe rebased SWE-smith & multimodal base support May 1, 2025
@klieret klieret requested a review from Copilot May 1, 2025 08:11
Copy link
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces multimodal support and initial SWE-smith changes while refining repository reset procedures. Key changes include:

  • Updating tests to validate the new “multimodal” subset for bench instances.
  • Adding the XMLFunctionCallingParser and expanding instance source support with SWE-smith.
  • Adjusting Docker image naming conventions and simplifying reset commands in the environment.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_batch_instance.py Adds “multimodal” subset case in bench instance tests.
sweagent/tools/parsing.py Introduces XMLFunctionCallingParser for XML-based tool responses.
sweagent/run/hooks/swe_bench_evaluate.py Extends subset mapping to include “multimodal”.
sweagent/run/batch_instances.py Updates image name formatting and adds SWE-smith instance support.
sweagent/environment/swe_env.py Revises reset repository commands by removing the -x flag.
Comments suppressed due to low confidence (3)

tests/test_batch_instance.py:35

  • [nitpick] The test now iterates over the multimodal subset; please ensure that separate test cases or assertions are in place to validate multimodal-specific behavior.
for subset in ["lite", "verified", "full", "multimodal"]:

sweagent/run/batch_instances.py:162

  • [nitpick] Verify that the lowercasing of the Docker image name is intentional and does not conflict with case-sensitive tag requirements.
image_name = f"swebench/sweb.eval.x86_64.{id_docker_compatible}:latest".lower()

sweagent/environment/swe_env.py:160

  • [nitpick] Ensure that the revised git reset commands (including the removal of the '-x' flag in 'git clean') reliably achieve a fully clean state and do not leave behind untracked changes.
git checkout {self.repo.base_commit}

Comment on lines +213 to +214
FN_REGEX_PATTERN = r"<function=([^>]+)>\n(.*?)</function>"
FN_PARAM_REGEX_PATTERN = r"<parameter=([^>]+)>(.*?)</parameter>"
Copy link
Preview
Copilot AI May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider pre-compiling the regex patterns FN_REGEX_PATTERN and FN_PARAM_REGEX_PATTERN to improve performance if this parser is invoked frequently.

Suggested change
FN_REGEX_PATTERN = r"<function=([^>]+)>\n(.*?)</function>"
FN_PARAM_REGEX_PATTERN = r"<parameter=([^>]+)>(.*?)</parameter>"
FN_REGEX_PATTERN = re.compile(r"<function=([^>]+)>\n(.*?)</function>")
FN_PARAM_REGEX_PATTERN = re.compile(r"<parameter=([^>]+)>(.*?)</parameter>")

Copilot uses AI. Check for mistakes.

@klieret klieret merged commit 76c65b5 into main May 1, 2025
6 checks passed
@klieret klieret deleted the swe-rebased branch May 1, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0