Skip to content

feat: Add ToolResultReviewer trait for critiquing open-ended tool results #1191

@Phoenix500526

Description

@Phoenix500526
  • I have looked for existing issues (including closed) about this

Feature Request

Add a ToolResultReviewer capability that allows critiquing tool execution results in the agentic loop, enabling smaller specialized models to guide the main LLM toward better tool usage.

Motivation

Some tools are open-ended - there are multiple ways to achieve the same goal, but some approaches are better than others. Say terminal, rag, web search or something like that

Example: Terminal tool for viewing directory structure

  • Approach A: Use ls to list subdirectories, then execute ls on each subdirectory one by one
  • Approach B: Use tree to get the complete directory structure in one call

From a human perspective, tree is clearly better - it's more efficient and reduces the number of tool calls.

The main LLM can tell when a tool call succeeds or fails, but it cannot evaluate whether a successful approach was optimal. A reviewer powered by a smaller, specialized model with up-to-date domain knowledge can:

  1. Guide better tool usage: Suggest more efficient approaches (e.g., "use tree instead of multiple ls calls")
  2. Save resources: Reduce unnecessary tool calls and token usage
  3. Provide domain expertise: Smaller models can be fine-tuned or prompted with specific business knowledge

Proposal

Add ToolResultReviewer trait to rig-core:

 pub trait ToolResultReviewer: Clone + WasmCompatSend + WasmCompatSync {
     fn critique(
         &self,
         tool_name: &str,
         tool_call_id: Option<String>,
         args: &str,
         result: &str,
         cancel_sig: CancelSignal,
     ) -> impl Future<Output = String> + WasmCompatSend {
         let result = result.to_string();
         async { result }
     }
 }

Key design decisions:

  • critique() returns String directly - the implementer controls the final output format (important for JSON tool results)
  • Only called on successful tool executions - failures should be passed directly to the LLM
  • Attached via with_reviewer() builder method on PromptRequest

Alternatives

Instead of a separate reviewer trait, each tool could implement its own critique logic internally.

Drawbacks:

  • Code duplication: Open-ended tools like Terminal, RAG, and Web Search all need similar critique functionality
  • Tight coupling: Mixes critique concerns with tool execution logic
  • Less flexible: Cannot easily swap or configure different critique strategies

The ToolResultReviewer trait provides a clean separation of concerns and allows reuse across multiple tools.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions