-
-
Notifications
You must be signed in to change notification settings - Fork 779
Support for Sonnet 3.7 Reasoning for ANTRHOPIC_JSON #1361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Alrea 8000 dy on GitHub? Sign in to your account
Conversation
Sonnet 3.7 fix
cohere = [ | ||
"cohere<6.0.0,>=5.1.8", | ||
] | ||
anthropic = ["anthropic==0.47.2", "xmltodict<0.15,>=0.13"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to 0.47.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why so many diffs? mind cleaning up this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to e5d7467 in 2 minutes and 20 seconds
More details
- Looked at
263
lines of code in4
files - Skipped
0
files when reviewing. - Skipped posting
7
drafted comments based on config settings.
1. instructor/function_calls.py:225
- Draft comment:
Check for empty text_blocks list to avoid IndexError. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
This is a valid concern - if there are no text blocks in the content, accessing index 0 would raise an IndexError. The code assumes there will always be at least one text block. Looking at the Anthropic API docs or other files might help understand if this is a valid assumption, but following the rules, I should make decisions based only on this file.
I could be wrong about whether Anthropic messages always contain at least one text block. Maybe there's a guarantee in the API that I'm not aware of.
Even if there is such a guarantee, defensive programming against IndexError is a good practice, and the suggested fix is simple and clear.
Keep the comment as it suggests a valid improvement to handle a potential edge case that could cause crashes.
2. pyproject.toml:2
- Draft comment:
Ensure dependency version bumps are reflected in docs/test updates. - Reason this comment was not posted:
Confidence changes required:70%
<= threshold85%
None
3. tests/llm/test_anthropic/test_reasoning.py:20
- Draft comment:
Include edge-case tests for missing or unexpected response shapes. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold85%
None
4. instructor/function_calls.py:226
- Draft comment:
Risk: If no text block exists, text_blocks[0] will raise an IndexError. Consider checking that text_blocks is non-empty. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
This is a change to handle Anthropic API responses. Looking at the Anthropic API docs and context, every Message should contain at least one text block. However, the code is defensive with type checks elsewhere, so being defensive here would be consistent. The empty string default seems questionable though - an empty response likely indicates an API error that should be handled explicitly.
I may be wrong about Anthropic API guarantees - perhaps there are valid cases where a Message contains no text blocks. The code assumes text blocks exist but doesn't validate this assumption.
Even if I'm uncertain about API guarantees, defensive programming would suggest validating our assumptions, especially when dealing with external APIs. An IndexError here would be confusing to debug.
The comment identifies a real potential error case in changed code that should be handled. However, returning empty string may hide errors - we should raise an explicit exception instead.
5. tests/llm/test_anthropic/test_reasoning.py:31
- Draft comment:
For float comparisons, consider using a tolerance (e.g., pytest.approx) rather than strict equality. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85%
This is a test of logical reasoning where we're checking if the LLM can correctly identify that 9.8 is NOT larger than 9.11. The test appears to be wrong - it asserts the answer is 9.8 when 9.11 is actually larger. Using approx here would make the test less precise when we actually want exact precision since we're testing logical reasoning.
The comment about float comparison best practices is generally good advice, but am I missing something about the actual test logic being incorrect?
The test itself appears to have a logic error - the assertion expects the wrong answer. Using approx would mask this issue rather than fix it.
The comment should be deleted because it suggests making the test less precise when precision is actually needed, and more importantly, the test itself appears to have a logical error that should be fixed instead.
6. tests/llm/test_anthropic/test_reasoning.py:5
- Draft comment:
Remove unused imports ('models', 'modes') to keep the code clean. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85%
The comment is technically correct - those imports are unused. However, this is something that would be caught by standard linters and CI tools. Most IDEs would also highlight these as unused imports. According to the rules, we shouldn't comment on things that would be obviously caught by the build.
The comment is factually accurate and promotes code cleanliness. Maybe keeping it could help catch the issue earlier in the review process?
While accurate, this is exactly the kind of mechanical issue that should be caught by automated tools rather than manual review comments. It adds noise to the review without providing meaningful value.
Delete this comment as it points out an issue that should be caught by standard linting tools rather than manual review.
7. uv.lock:499
- Draft comment:
Platform markers updated to use 'sys_platform' improve compatibility. Verify these changes are consistent across all dependency definitions. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_laGcOG5IIxsgNq9O
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Simple patch to support Sonnet 3.7 with Thinking:
#1357
Only works for ANTHROPIC_JSON mode
Important
Adds support for Sonnet 3.7 reasoning in ANTHROPIC_JSON mode and updates
anthropic
package to 0.47.2.parse_anthropic_json()
infunction_calls.py
by selecting the first text block fromcompletion.content
.ANTHROPIC_JSON
mode.anthropic
package version to0.47.2
inpyproject.toml
anduv.lock
.test_reasoning()
intest_reasoning.py
to validate reasoning with Sonnet 3.7 usingANTHROPIC_JSON
mode.pyproject.toml
.This description was created by
for e5d7467. It will automatically update as commits are pushed.