-
Notifications
You must be signed in to change notification settings - Fork 616
Description
- I have looked for existing issues (including closed) about this
Feature Request
Add a ToolResultReviewer capability that allows critiquing tool execution results in the agentic loop, enabling smaller specialized models to guide the main LLM toward better tool usage.
Motivation
Some tools are open-ended - there are multiple ways to achieve the same goal, but some approaches are better than others. Say terminal, rag, web search or something like that
Example: Terminal tool for viewing directory structure
- Approach A: Use ls to list subdirectories, then execute ls on each subdirectory one by one
- Approach B: Use tree to get the complete directory structure in one call
From a human perspective, tree is clearly better - it's more efficient and reduces the number of tool calls.
The main LLM can tell when a tool call succeeds or fails, but it cannot evaluate whether a successful approach was optimal. A reviewer powered by a smaller, specialized model with up-to-date domain knowledge can:
- Guide better tool usage: Suggest more efficient approaches (e.g., "use tree instead of multiple ls calls")
- Save resources: Reduce unnecessary tool calls and token usage
- Provide domain expertise: Smaller models can be fine-tuned or prompted with specific business knowledge
Proposal
Add ToolResultReviewer trait to rig-core:
pub trait ToolResultReviewer: Clone + WasmCompatSend + WasmCompatSync {
fn critique(
&self,
tool_name: &str,
tool_call_id: Option<String>,
args: &str,
result: &str,
cancel_sig: CancelSignal,
) -> impl Future<Output = String> + WasmCompatSend {
let result = result.to_string();
async { result }
}
}Key design decisions:
- critique() returns String directly - the implementer controls the final output format (important for JSON tool results)
- Only called on successful tool executions - failures should be passed directly to the LLM
- Attached via with_reviewer() builder method on PromptRequest
Alternatives
Instead of a separate reviewer trait, each tool could implement its own critique logic internally.
Drawbacks:
- Code duplication: Open-ended tools like Terminal, RAG, and Web Search all need similar critique functionality
- Tight coupling: Mixes critique concerns with tool execution logic
- Less flexible: Cannot easily swap or configure different critique strategies
The ToolResultReviewer trait provides a clean separation of concerns and allows reuse across multiple tools.