Skip to content

Conversation

Copy link

Copilot AI commented Sep 1, 2025

The AgentOps instrumentation was causing agent crashes when LLM servers returned HTTP 503 (Service Unavailable) errors. This resulted in agents returning answer: None instead of computing actual results.

Problem

When using local LLM servers (like vLLM) that occasionally return 503 errors due to overload or temporary unavailability, the AgentOps instrumentation would crash while attempting to parse the HTTP response:

🖇 AgentOps: [OPENAI WRAPPER] Error in async_chat_completion_stream_wrapper: Error code: 503
Failure: Error code: 503
answer: None ground_truth: 8 reward: 0.0

The root cause was in agentlightning/instrumentation/agentops.py where the code called http_response.json() without proper error handling in both _patch_new_agentops() and _patch_old_agentops() functions.

Solution

Added try-catch blocks around JSON parsing operations to handle HTTP error responses gracefully:

# Before (would crash on 503 errors):
json_data = return_value.http_response.json()
if isinstance(json_data, dict):
    # ... process token data ...

# After (handles errors gracefully):
try:
    json_data = return_value.http_response.json()
    if isinstance(json_data, dict):
        # ... process token data ...
except Exception as e:
    logger.debug(f"Failed to parse HTTP response JSON for token extraction: {e}")

Impact

  • Agents continue running normally even when LLM servers return 503 errors
  • Token extraction failures are logged as debug messages without interrupting execution
  • Successful responses continue to work exactly as before
  • Minimal change approach with only 18 lines modified (6 added, 12 changed)

This fix specifically resolves the issue in the examples/calc_x directory where agents were failing to compute mathematical answers due to instrumentation crashes.

Fixes #56.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copy link
Author

Copilot AI commented Sep 1, 2025

@ultmaster 👋 This repository doesn't have Copilot instructions. With Copilot instructions, I can understand the repository better, work faster and produce higher quality PRs.

I can generate a .github/copilot-instructions.md file for you automatically. Click here to open a pre-filled issue and assign it to me. I'll write the instructions, and then tag you for review.

Copilot AI changed the title [WIP] AgentOps: [OPENAI WRAPPER] Error in async_chat_completion_stream_wrapper: Error code: 503 Fix AgentOps HTTP response error handling for 503 Service Unavailable errors Sep 1, 2025
Copilot AI requested a review from ultmaster September 1, 2025 09:12
Copilot finished work on behalf of ultmaster September 1, 2025 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AgentOps: [OPENAI WRAPPER] Error in async_chat_completion_stream_wrapper: Error code: 503

2 participants