Fix: Handle criteria returned as JSON string in JudgeAgent #196
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #161
This PR addresses an intermittent bug where
JudgeAgentfails withAttributeError: 'str' object has no attribute 'values'when the LLM returns thecriteriafield as a JSON string instead of a dictionary object.Problem
The error occurs at lines 439 and 444 in
judge_agent.pywhen the code callscriteria.values()without verifying thatcriteriais actually a dict:Root Cause
When the LLM is uncertain about the schema format (particularly with complex dynamic schemas using sanitized criterion text as property names), it sometimes serializes the nested
criteriaobject as a JSON string rather than a proper dict.Example of problematic LLM response:
{ "verdict": "success", "reasoning": "...", "criteria": "{\"criterion_1\": \"true\", \"criterion_2\": \"false\"}" // ❌ String instead of object }Expected format:
{ "verdict": "success", "reasoning": "...", "criteria": {"criterion_1": "true", "criterion_2": "false"} // ✅ Object }Solution
This PR adds defensive parsing after extracting
criteriafrom tool call arguments:Check if
criteriais a stringjson.loads()Verify
criteriais a dict before calling.values()Benefits
✅ Graceful handling: Both dict and JSON string formats are now supported
✅ Detailed logging: Debug and warning messages help diagnose issues
✅ Safe fallback: Empty dict prevents test failures
✅ Low risk: Only adds defensive parsing, no changes to normal execution path
✅ Fixes intermittent failures: Addresses the root cause reported in #161
Testing
python -m py_compile{"criterion_1": "true", "criterion_2": "false"}"{\"criterion_1\": \"true\", \"criterion_2\": \"false\"}"Changes
File modified:
python/scenario/judge_agent.pyLines added: 24 lines of defensive parsing (after line 435)
Impact: Low risk, backward compatible
Related Issues
Closes #161
Ready for review! This fix prevents the intermittent
AttributeErrorwhile maintaining backward compatibility with existing tests.