8000 Support for Sonnet 3.7 Reasoning for ANTRHOPIC_JSON by A-F-V · Pull Request #1361 · 567-labs/instructor · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Support for Sonnet 3.7 Reasoning for ANTRHOPIC_JSON #1361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Alrea 8000 dy on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 3, 2025
Merged

Conversation

A-F-V
Copy link
Contributor
@A-F-V A-F-V commented Feb 27, 2025

Simple patch to support Sonnet 3.7 with Thinking:
#1357

Only works for ANTHROPIC_JSON mode


Important

Adds support for Sonnet 3.7 reasoning in ANTHROPIC_JSON mode and updates anthropic package to 0.47.2.

  • Behavior:
    • Supports Sonnet 3.7 reasoning in parse_anthropic_json() in function_calls.py by selecting the first text block from completion.content.
    • Only works in ANTHROPIC_JSON mode.
  • Dependencies:
    • Updates anthropic package version to 0.47.2 in pyproject.toml and uv.lock.
  • Testing:
    • Adds test_reasoning() in test_reasoning.py to validate reasoning with Sonnet 3.7 using ANTHROPIC_JSON mode.
  • Misc:
    • Minor formatting changes in pyproject.toml.

This description was created by Ellipsis for e5d7467. It will automatically update as commits are pushed.

cohere = [
"cohere<6.0.0,>=5.1.8",
]
anthropic = ["anthropic==0.47.2", "xmltodict<0.15,>=0.13"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to 0.47.2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why so many diffs? mind cleaning up this PR

Copy link
Contributor
@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to e5d7467 in 2 minutes and 20 seconds

More details
  • Looked at 263 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. instructor/function_calls.py:225
  • Draft comment:
    Check for empty text_blocks list to avoid IndexError.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
    This is a valid concern - if there are no text blocks in the content, accessing index 0 would raise an IndexError. The code assumes there will always be at least one text block. Looking at the Anthropic API docs or other files might help understand if this is a valid assumption, but following the rules, I should make decisions based only on this file.
    I could be wrong about whether Anthropic messages always contain at least one text block. Maybe there's a guarantee in the API that I'm not aware of.
    Even if there is such a guarantee, defensive programming against IndexError is a good practice, and the suggested fix is simple and clear.
    Keep the comment as it suggests a valid improvement to handle a potential edge case that could cause crashes.
2. pyproject.toml:2
  • Draft comment:
    Ensure dependency version bumps are reflected in docs/test updates.
  • Reason this comment was not posted:
    Confidence changes required: 70% <= threshold 85%
    None
3. tests/llm/test_anthropic/test_reasoning.py:20
  • Draft comment:
    Include edge-case tests for missing or unexpected response shapes.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 85%
    None
4. instructor/function_calls.py:226
  • Draft comment:
    Risk: If no text block exists, text_blocks[0] will raise an IndexError. Consider checking that text_blocks is non-empty.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
    This is a change to handle Anthropic API responses. Looking at the Anthropic API docs and context, every Message should contain at least one text block. However, the code is defensive with type checks elsewhere, so being defensive here would be consistent. The empty string default seems questionable though - an empty response likely indicates an API error that should be handled explicitly.
    I may be wrong about Anthropic API guarantees - perhaps there are valid cases where a Message contains no text blocks. The code assumes text blocks exist but doesn't validate this assumption.
    Even if I'm uncertain about API guarantees, defensive programming would suggest validating our assumptions, especially when dealing with external APIs. An IndexError here would be confusing to debug.
    The comment identifies a real potential error case in changed code that should be handled. However, returning empty string may hide errors - we should raise an explicit exception instead.
5. tests/llm/test_anthropic/test_reasoning.py:31
  • Draft comment:
    For float comparisons, consider using a tolerance (e.g., pytest.approx) rather than strict equality.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85%
    This is a test of logical reasoning where we're checking if the LLM can correctly identify that 9.8 is NOT larger than 9.11. The test appears to be wrong - it asserts the answer is 9.8 when 9.11 is actually larger. Using approx here would make the test less precise when we actually want exact precision since we're testing logical reasoning.
    The comment about float comparison best practices is generally good advice, but am I missing something about the actual test logic being incorrect?
    The test itself appears to have a logic error - the assertion expects the wrong answer. Using approx would mask this issue rather than fix it.
    The comment should be deleted because it suggests making the test less precise when precision is actually needed, and more importantly, the test itself appears to have a logical error that should be fixed instead.
6. tests/llm/test_anthropic/test_reasoning.py:5
  • Draft comment:
    Remove unused imports ('models', 'modes') to keep the code clean.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85%
    The comment is technically correct - those imports are unused. However, this is something that would be caught by standard linters and CI tools. Most IDEs would also highlight these as unused imports. According to the rules, we shouldn't comment on things that would be obviously caught by the build.
    The comment is factually accurate and promotes code cleanliness. Maybe keeping it could help catch the issue earlier in the review process?
    While accurate, this is exactly the kind of mechanical issue that should be caught by automated tools rather than manual review comments. It adds noise to the review without providing meaningful value.
    Delete this comment as it points out an issue that should be caught by standard linting tools rather than manual review.
7. uv.lock:499
  • Draft comment:
    Platform markers updated to use 'sys_platform' improve compatibility. Verify these changes are consistent across all dependency definitions.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_laGcOG5IIxsgNq9O


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@jxnl jxnl merged commit d38c8d0 into 567-labs:main Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0