Skip to content

Conversation

@rajdeepmahal24
Copy link

Summary

Fixes #161

This PR addresses an intermittent bug where JudgeAgent fails with AttributeError: 'str' object has no attribute 'values' when the LLM returns the criteria field as a JSON string instead of a dictionary object.

Problem

The error occurs at lines 439 and 444 in judge_agent.py when the code calls criteria.values() without verifying that criteria is actually a dict:

passed_criteria = [
    self.criteria[idx]
    for idx, criterion in enumerate(criteria.values())  # ❌ Fails if criteria is a string
    if criterion == "true"
]

Root Cause

When the LLM is uncertain about the schema format (particularly with complex dynamic schemas using sanitized criterion text as property names), it sometimes serializes the nested criteria object as a JSON string rather than a proper dict.

Example of problematic LLM response:

{
  "verdict": "success",
  "reasoning": "...",
  "criteria": "{\"criterion_1\": \"true\", \"criterion_2\": \"false\"}"  // ❌ String instead of object
}

Expected format:

{
  "verdict": "success",
  "reasoning": "...",
  "criteria": {"criterion_1": "true", "criterion_2": "false"}  // ✅ Object
}

Solution

This PR adds defensive parsing after extracting criteria from tool call arguments:

  1. Check if criteria is a string

    • If yes, attempt to parse it with json.loads()
    • If parsing fails, log a warning and use empty dict as fallback
  2. Verify criteria is a dict before calling .values()

    • Additional safety check to prevent the AttributeError
    • Log warning and use empty dict fallback if not a dict
# Handle case where LLM returns criteria as a JSON string instead of dict
if isinstance(criteria, str):
    try:
        criteria = json.loads(criteria)
        logger.debug("JudgeAgent: Parsed criteria from JSON string to dict")
    except json.JSONDecodeError:
        logger.warning(
            f"JudgeAgent: Failed to parse criteria string as JSON: {criteria}. "
            "Using empty dict as fallback."
        )
        criteria = {}

# Ensure criteria is a dict before calling .values()
if not isinstance(criteria, dict):
    logger.warning(
        f"JudgeAgent: criteria is {type(criteria).__name__}, expected dict. "
        "Using empty dict as fallback."
    )
    criteria = {}

Benefits

Graceful handling: Both dict and JSON string formats are now supported
Detailed logging: Debug and warning messages help diagnose issues
Safe fallback: Empty dict prevents test failures
Low risk: Only adds defensive parsing, no changes to normal execution path
Fixes intermittent failures: Addresses the root cause reported in #161

Testing

  • ✅ Verified Python syntax with python -m py_compile
  • ✅ Code handles both formats correctly:
    • Direct dict: {"criterion_1": "true", "criterion_2": "false"}
    • JSON string: "{\"criterion_1\": \"true\", \"criterion_2\": \"false\"}"
  • ✅ Fallback behavior tested for malformed JSON

Changes

File modified: python/scenario/judge_agent.py
Lines added: 24 lines of defensive parsing (after line 435)
Impact: Low risk, backward compatible

Related Issues

Closes #161


Ready for review! This fix prevents the intermittent AttributeError while maintaining backward compatibility with existing tests.

Fixes langwatch#161

## Problem
JudgeAgent intermittently fails with `AttributeError: 'str' object has no
attribute 'values'` when the LLM returns the `criteria` field as a JSON
string instead of a dictionary object.

This occurs at lines 439 and 444 when the code calls `criteria.values()`
without verifying that `criteria` is actually a dict.

## Root Cause
When the LLM is uncertain about the schema format (particularly with
complex dynamic schemas using sanitized criterion text as property names),
it sometimes serializes the nested `criteria` object as a JSON string
rather than a proper dict.

## Solution
Add defensive parsing after extracting criteria from tool call arguments:

1. Check if `criteria` is a string
2. If yes, attempt to parse it with `json.loads()`
3. If parsing fails, log a warning and use empty dict as fallback
4. Additionally verify `criteria` is a dict before calling `.values()`

This ensures the code gracefully handles both formats:
- Direct dict: `{"criterion_1": "true", "criterion_2": "false"}`
- JSON string: `"{\"criterion_1\": \"true\", \"criterion_2\": \"false\"}"`

## Testing
- Verified Python syntax with `python -m py_compile`
- Fix includes detailed logging for debugging
- Graceful fallback prevents test failures

## Impact
- Low risk: Only adds defensive parsing with fallback
- Fixes intermittent failures reported in issue langwatch#161
- No changes to normal execution path when criteria is already a dict
@rogeriochaves rogeriochaves force-pushed the main branch 2 times, most recently from 77a92af to 9fdb87c Compare December 16, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JudgeAgent AttributeError when LLM returns criteria as string instead of dict

1 participant