Skip to content

Tool Failure Crashes Entire ADK Multi-Agent Workflow #3355

@hangfei

Description

@hangfei

Discussed in #795

Originally posted by nikolaidk May 15, 2025
Maintainer's comment: we'd like to seek options on this topic from community.

check out #795 (comment) for the poll and cast your opinions.


Original content

MCP Tool Failure Crashes Entire ADK Multi-Agent Workflow

When an MCP tool fails during execution (not connection), it propagates as an unhandled exception that crashes the entire ADK agent workflow, stopping all subsequent agents in a SequentialAgent pipeline.

Environment
ADK Version: Latest (using google.adk.agents, google.adk.tools.mcp_tool)
Python Version: 3.12
MCP Library Version: Latest compatible with current ADK implementation
Operating System: Linux 5.15
Problem Description
While ADK provides good error handling for MCP server connection failures, runtime MCP tool failures (like "Resource not found") propagate as unhandled McpError exceptions that crash the entire multi-agent workflow.

Expected Behavior
Individual MCP tool failures should not crash the entire agent workflow
Agents should be able to handle tool failures gracefully and continue execution
Sequential agents should continue to subsequent agents even if one tool fails
The framework should provide built-in resilience mechanisms for MCP tool failures
Actual Behavior
Single MCP tool failure crashes the entire SequentialAgent workflow
No opportunity for graceful degradation or alternative approaches
Complete loss of partial results from successful agents
Workflow stops executing without running subsequent agents
Steps to Reproduce
Create a multi-agent workflow using SequentialAgent
Include an MCP tool that may fail (e.g., GitHub file access with invalid path)
Configure the agent to use the MCP tool
Run the workflow with inputs that will cause the MCP tool to fail
Minimal Reproducible Example
python
import asyncio
from google.adk.agents import SequentialAgent, LlmAgent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
from google.adk.tools.mcp_tool.mcp_toolset import StdioServerParameters

async def create_failing_workflow():
# Setup GitHub MCP tools
git_tools, git_exit_stack = await MCPToolset.from_server(
connection_params=StdioServerParameters(
command='npx',
args=["-y", "@modelcontextprotocol/server-github"],
env={"GITHUB_PERSONAL_ACCESS_TOKEN": "your_token"}
)
)

# Create agent that will use failing MCP tool
failing_agent = LlmAgent(
    name="FailingAgent",
    model="gemini-2.5-pro-preview-05-06",
    instruction="Try to access a non-existent file from the repository",
    tools=git_tools
)

# Create workflow with subsequent agents
workflow = SequentialAgent(
    name="TestWorkflow",
    sub_agents=[failing_agent, other_agent1, other_agent2]
)

return workflow, git_exit_stack

Run with input that causes MCP tool to fail

Result: Entire workflow crashes, other_agent1 and other_agent2 never execute

Error Log
mcp.shared.exceptions.McpError: Not Found: Resource not found: Not Found
File "/home/nmr/.venv/lib/python3.12/site-packages/google/adk/tools/mcp_tool/mcp_tool.py", line 126, in run_async
raise e
File "/home/nmr/.venv/lib/python3.12/site-packages/google/adk/tools/mcp_tool/mcp_tool.py", line 122, in run_async
response = await self.mcp_session.call_tool(self.name, arguments=args)
File "/home/nmr/.venv/lib/python3.12/site-packages/mcp/client/session.py", line 265, in call_tool
return await self.send_request(
File "/home/nmr/.venv/lib/python3.12/site-packages/mcp/shared/session.py", line 273, in send_request
raise McpError(response_or_error.error)
Current Workarounds
Agent Instruction Level: Explicitly instruct agents to handle tool failures
Wrapper Functions: Create wrapper tools with try-catch logic
Alternative Agent Patterns: Use custom agents instead of SequentialAgent
Suggested Solutions

  1. Framework-Level Error Handling
    Add built-in error handling in MCPTool.run_async():

python
async def run_async(self, args, tool_context):
try:
response = await self.mcp_session.call_tool(self.name, arguments=args)
return response
except McpError as e:
# Convert to tool result with error information
return {
"error": True,
"error_type": "mcp_tool_failure",
"error_message": str(e),
"tool_name": self.name,
"suggestions": ["Try alternative tools", "Check connectivity"]
}
2. SequentialAgent Resilience
Modify SequentialAgent to continue execution despite sub-agent failures:

python

Add option for fault-tolerant execution

workflow = SequentialAgent(
name="FaultTolerantWorkflow",
sub_agents=[agent1, agent2, agent3],
continue_on_failure=True, # New parameter
collect_partial_results=True # New parameter
)
3. Circuit Breaker Pattern
Implement circuit breaker functionality for MCP tools to prevent cascading failures.

Impact
Severity: High - Crashes entire workflows
Frequency: Common when using external MCP servers
Workaround Complexity: Medium - Requires manual error handling
Additional Context
This issue significantly impacts the reliability of production ADK systems using MCP tools. The current behavior makes it difficult to build robust multi-agent systems that can gracefully handle partial failures.

Related Issues
[Link to any related issues if they exist]
Feature Request
Consider adding:

Built-in error handling options for MCP tools
Fault-tolerant execution modes for multi-agent workflows
Circuit breaker patterns for external tool integrations
Better error propagation and handling documentation
Labels: bug, enhancement, mcp-tools, multi-agent, error-handling

Metadata

Metadata

Assignees

No one assigned

    Labels

    core[Component] This issue is related to the core interface and implementation

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions