diff --git a/docs/agents.md b/docs/agents.md index d401f53da..a2e0db1bd 100644 --- a/docs/agents.md +++ b/docs/agents.md @@ -1,285 +1,308 @@ -# Agents - -Agents are the core building block in your apps. An agent is a large language model (LLM), configured with instructions and tools. - -## Basic configuration - -The most common properties of an agent you'll configure are: - -- `name`: A required string that identifies your agent. -- `instructions`: also known as a developer message or system prompt. -- `model`: which LLM to use, and optional `model_settings` to configure model tuning parameters like temperature, top_p, etc. -- `tools`: Tools that the agent can use to achieve its tasks. - -```python -from agents import Agent, ModelSettings, function_tool - -@function_tool -def get_weather(city: str) -> str: - """returns weather info for the specified city.""" - return f"The weather in {city} is sunny" - -agent = Agent( - name="Haiku agent", - instructions="Always respond in haiku form", - model="gpt-5-nano", - tools=[get_weather], -) -``` - -## Context - -Agents are generic on their `context` type. Context is a dependency-injection tool: it's an object you create and pass to `Runner.run()`, that is passed to every agent, tool, handoff etc, and it serves as a grab bag of dependencies and state for the agent run. You can provide any Python object as the context. - -```python -@dataclass -class UserContext: - name: str - uid: str - is_pro_user: bool - - async def fetch_purchases() -> list[Purchase]: - return ... - -agent = Agent[UserContext]( - ..., -) -``` - -## Output types - -By default, agents produce plain text (i.e. `str`) outputs. If you want the agent to produce a particular type of output, you can use the `output_type` parameter. A common choice is to use [Pydantic](https://docs.pydantic.dev/) objects, but we support any type that can be wrapped in a Pydantic [TypeAdapter](https://docs.pydantic.dev/latest/api/type_adapter/) - dataclasses, lists, TypedDict, etc. - -```python -from pydantic import BaseModel -from agents import Agent - - -class CalendarEvent(BaseModel): - name: str - date: str - participants: list[str] - -agent = Agent( - name="Calendar extractor", - instructions="Extract calendar events from text", - output_type=CalendarEvent, -) -``` - -!!! note - - When you pass an `output_type`, that tells the model to use [structured outputs](https://platform.openai.com/docs/guides/structured-outputs) instead of regular plain text responses. - -## Multi-agent system design patterns - -There are many ways to design multi‑agent systems, but we commonly see two broadly applicable patterns: - -1. Manager (agents as tools): A central manager/orchestrator invokes specialized sub‑agents as tools and retains control of the conversation. -2. Handoffs: Peer agents hand off control to a specialized agent that takes over the conversation. This is decentralized. - -See [our practical guide to building agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf) for more details. - -### Manager (agents as tools) - -The `customer_facing_agent` handles all user interaction and invokes specialized sub‑agents exposed as tools. Read more in the [tools](tools.md#agents-as-tools) documentation. - -```python -from agents import Agent - -booking_agent = Agent(...) -refund_agent = Agent(...) - -customer_facing_agent = Agent( - name="Customer-facing agent", - instructions=( - "Handle all direct user communication. " - "Call the relevant tools when specialized expertise is needed." - ), - tools=[ - booking_agent.as_tool( - tool_name="booking_expert", - tool_description="Handles booking questions and requests.", - ), - refund_agent.as_tool( - tool_name="refund_expert", - tool_description="Handles refund questions and requests.", - ) - ], -) -``` - -### Handoffs - -Handoffs are sub‑agents the agent can delegate to. When a handoff occurs, the delegated agent receives the conversation history and takes over the conversation. This pattern enables modular, specialized agents that excel at a single task. Read more in the [handoffs](handoffs.md) documentation. - -```python -from agents import Agent - -booking_agent = Agent(...) -refund_agent = Agent(...) - -triage_agent = Agent( - name="Triage agent", - instructions=( - "Help the user with their questions. " - "If they ask about booking, hand off to the booking agent. " - "If they ask about refunds, hand off to the refund agent." - ), - handoffs=[booking_agent, refund_agent], -) -``` - -## Dynamic instructions - -In most cases, you can provide instructions when you create the agent. However, you can also provide dynamic instructions via a function. The function will receive the agent and context, and must return the prompt. Both regular and `async` functions are accepted. - -```python -def dynamic_instructions( - context: RunContextWrapper[UserContext], agent: Agent[UserContext] -) -> str: - return f"The user's name is {context.context.name}. Help them with their questions." - - -agent = Agent[UserContext]( - name="Triage agent", - instructions=dynamic_instructions, -) -``` - -## Lifecycle events (hooks) - -Sometimes, you want to observe the lifecycle of an agent. For example, you may want to log events, or pre-fetch data when certain events occur. You can hook into the agent lifecycle with the `hooks` property. Subclass the [`AgentHooks`][agents.lifecycle.AgentHooks] class, and override the methods you're interested in. - -## Guardrails - -Guardrails allow you to run checks/validations on user input in parallel to the agent running, and on the agent's output once it is produced. For example, you could screen the user's input and agent's output for relevance. Read more in the [guardrails](guardrails.md) documentation. - -## Cloning/copying agents - -By using the `clone()` method on an agent, you can duplicate an Agent, and optionally change any properties you like. - -```python -pirate_agent = Agent( - name="Pirate", - instructions="Write like a pirate", - model="gpt-4.1", -) - -robot_agent = pirate_agent.clone( - name="Robot", - instructions="Write like a robot", -) -``` - -## Forcing tool use - -Supplying a list of tools doesn't always mean the LLM will use a tool. You can force tool use by setting [`ModelSettings.tool_choice`][agents.model_settings.ModelSettings.tool_choice]. Valid values are: - -1. `auto`, which allows the LLM to decide whether or not to use a tool. -2. `required`, which requires the LLM to use a tool (but it can intelligently decide which tool). -3. `none`, which requires the LLM to _not_ use a tool. -4. Setting a specific string e.g. `my_tool`, which requires the LLM to use that specific tool. - -```python -from agents import Agent, Runner, function_tool, ModelSettings - -@function_tool -def get_weather(city: str) -> str: - """Returns weather info for the specified city.""" - return f"The weather in {city} is sunny" - -agent = Agent( - name="Weather Agent", - instructions="Retrieve weather details.", - tools=[get_weather], - model_settings=ModelSettings(tool_choice="get_weather") -) -``` - -## Tool Use Behavior - -The `tool_use_behavior` parameter in the `Agent` configuration controls how tool outputs are handled: - -- `"run_llm_again"`: The default. Tools are run, and the LLM processes the results to produce a final response. -- `"stop_on_first_tool"`: The output of the first tool call is used as the final response, without further LLM processing. - -```python -from agents import Agent, Runner, function_tool, ModelSettings - -@function_tool -def get_weather(city: str) -> str: - """Returns weather info for the specified city.""" - return f"The weather in {city} is sunny" - -agent = Agent( - name="Weather Agent", - instructions="Retrieve weather details.", - tools=[get_weather], - tool_use_behavior="stop_on_first_tool" -) -``` - -- `StopAtTools(stop_at_tool_names=[...])`: Stops if any specified tool is called, using its output as the final response. - -```python -from agents import Agent, Runner, function_tool -from agents.agent import StopAtTools - -@function_tool -def get_weather(city: str) -> str: - """Returns weather info for the specified city.""" - return f"The weather in {city} is sunny" - -@function_tool -def sum_numbers(a: int, b: int) -> int: - """Adds two numbers.""" - return a + b - -agent = Agent( - name="Stop At Stock Agent", - instructions="Get weather or sum numbers.", - tools=[get_weather, sum_numbers], - tool_use_behavior=StopAtTools(stop_at_tool_names=["get_weather"]) -) -``` - -- `ToolsToFinalOutputFunction`: A custom function that processes tool results and decides whether to stop or continue with the LLM. - -```python -from agents import Agent, Runner, function_tool, FunctionToolResult, RunContextWrapper -from agents.agent import ToolsToFinalOutputResult -from typing import List, Any - -@function_tool -def get_weather(city: str) -> str: - """Returns weather info for the specified city.""" - return f"The weather in {city} is sunny" - -def custom_tool_handler( - context: RunContextWrapper[Any], - tool_results: List[FunctionToolResult] -) -> ToolsToFinalOutputResult: - """Processes tool results to decide final output.""" - for result in tool_results: - if result.output and "sunny" in result.output: - return ToolsToFinalOutputResult( - is_final_output=True, - final_output=f"Final weather: {result.output}" - ) - return ToolsToFinalOutputResult( - is_final_output=False, - final_output=None - ) - -agent = Agent( - name="Weather Agent", - instructions="Retrieve weather details.", - tools=[get_weather], - tool_use_behavior=custom_tool_handler -) -``` - -!!! note - - To prevent infinite loops, the framework automatically resets `tool_choice` to "auto" after a tool call. This behavior is configurable via [`agent.reset_tool_choice`][agents.agent.Agent.reset_tool_choice]. The infinite loop is because tool results are sent to the LLM, which then generates another tool call because of `tool_choice`, ad infinitum. +# Agents + +Agents are the core building block in your apps. An agent is a large language model (LLM), configured with instructions and tools. + +## Basic configuration + +The most common properties of an agent you'll configure are: + +- `name`: A required string that identifies your agent. +- `instructions`: also known as a developer message or system prompt. +- `model`: which LLM to use, and optional `model_settings` to configure model tuning parameters like temperature, top_p, etc. +- `tools`: Tools that the agent can use to achieve its tasks. + +```python +from agents import Agent, ModelSettings, function_tool + +@function_tool +def get_weather(city: str) -> str: + """returns weather info for the specified city.""" + return f"The weather in {city} is sunny" + +agent = Agent( + name="Haiku agent", + instructions="Always respond in haiku form", + model="gpt-5-nano", + tools=[get_weather], +) +``` + +## Context + +Agents are generic on their `context` type. Context is a dependency-injection tool: it's an object you create and pass to `Runner.run()`, that is passed to every agent, tool, handoff etc, and it serves as a grab bag of dependencies and state for the agent run. You can provide any Python object as the context. + +```python +@dataclass +class UserContext: + name: str + uid: str + is_pro_user: bool + + async def fetch_purchases() -> list[Purchase]: + return ... + +agent = Agent[UserContext]( + ..., +) +``` + +## Output types + +By default, agents produce plain text (i.e. `str`) outputs. If you want the agent to produce a particular type of output, you can use the `output_type` parameter. A common choice is to use [Pydantic](https://docs.pydantic.dev/) objects, but we support any type that can be wrapped in a Pydantic [TypeAdapter](https://docs.pydantic.dev/latest/api/type_adapter/) - dataclasses, lists, TypedDict, etc. + +```python +from pydantic import BaseModel +from agents import Agent + + +class CalendarEvent(BaseModel): + name: str + date: str + participants: list[str] + +agent = Agent( + name="Calendar extractor", + instructions="Extract calendar events from text", + output_type=CalendarEvent, +) +``` + +!!! note + + When you pass an `output_type`, that tells the model to use [structured outputs](https://platform.openai.com/docs/guides/structured-outputs) instead of regular plain text responses. + +### Using structured outputs with tools + +Some models (like Google Gemini) don't natively support using tools and structured outputs together. For these cases, you can enable prompt injection: + +```python +from agents import Agent +from agents.extensions.models.litellm_model import LitellmModel + +agent = Agent( + name="Weather assistant", + model=LitellmModel( + "gemini/gemini-2.5-flash", + enable_structured_output_with_tools=True, # Required for Gemini + ), + tools=[get_weather], + output_type=WeatherReport, +) +``` + +The `enable_structured_output_with_tools` parameter on [`LitellmModel`][agents.extensions.models.litellm_model.LitellmModel] injects JSON formatting instructions into the system prompt as a workaround. This is only needed for models that lack native support for using tools and structured outputs simultaneously (like Gemini). + +See the [prompt injection documentation](models/structured_output_with_tools.md) for more details. + +## Multi-agent system design patterns + +There are many ways to design multi‑agent systems, but we commonly see two broadly applicable patterns: + +1. Manager (agents as tools): A central manager/orchestrator invokes specialized sub‑agents as tools and retains control of the conversation. +2. Handoffs: Peer agents hand off control to a specialized agent that takes over the conversation. This is decentralized. + +See [our practical guide to building agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf) for more details. + +### Manager (agents as tools) + +The `customer_facing_agent` handles all user interaction and invokes specialized sub‑agents exposed as tools. Read more in the [tools](tools.md#agents-as-tools) documentation. + +```python +from agents import Agent + +booking_agent = Agent(...) +refund_agent = Agent(...) + +customer_facing_agent = Agent( + name="Customer-facing agent", + instructions=( + "Handle all direct user communication. " + "Call the relevant tools when specialized expertise is needed." + ), + tools=[ + booking_agent.as_tool( + tool_name="booking_expert", + tool_description="Handles booking questions and requests.", + ), + refund_agent.as_tool( + tool_name="refund_expert", + tool_description="Handles refund questions and requests.", + ) + ], +) +``` + +### Handoffs + +Handoffs are sub‑agents the agent can delegate to. When a handoff occurs, the delegated agent receives the conversation history and takes over the conversation. This pattern enables modular, specialized agents that excel at a single task. Read more in the [handoffs](handoffs.md) documentation. + +```python +from agents import Agent + +booking_agent = Agent(...) +refund_agent = Agent(...) + +triage_agent = Agent( + name="Triage agent", + instructions=( + "Help the user with their questions. " + "If they ask about booking, hand off to the booking agent. " + "If they ask about refunds, hand off to the refund agent." + ), + handoffs=[booking_agent, refund_agent], +) +``` + +## Dynamic instructions + +In most cases, you can provide instructions when you create the agent. However, you can also provide dynamic instructions via a function. The function will receive the agent and context, and must return the prompt. Both regular and `async` functions are accepted. + +```python +def dynamic_instructions( + context: RunContextWrapper[UserContext], agent: Agent[UserContext] +) -> str: + return f"The user's name is {context.context.name}. Help them with their questions." + + +agent = Agent[UserContext]( + name="Triage agent", + instructions=dynamic_instructions, +) +``` + +## Lifecycle events (hooks) + +Sometimes, you want to observe the lifecycle of an agent. For example, you may want to log events, or pre-fetch data when certain events occur. You can hook into the agent lifecycle with the `hooks` property. Subclass the [`AgentHooks`][agents.lifecycle.AgentHooks] class, and override the methods you're interested in. + +## Guardrails + +Guardrails allow you to run checks/validations on user input in parallel to the agent running, and on the agent's output once it is produced. For example, you could screen the user's input and agent's output for relevance. Read more in the [guardrails](guardrails.md) documentation. + +## Cloning/copying agents + +By using the `clone()` method on an agent, you can duplicate an Agent, and optionally change any properties you like. + +```python +pirate_agent = Agent( + name="Pirate", + instructions="Write like a pirate", + model="gpt-4.1", +) + +robot_agent = pirate_agent.clone( + name="Robot", + instructions="Write like a robot", +) +``` + +## Forcing tool use + +Supplying a list of tools doesn't always mean the LLM will use a tool. You can force tool use by setting [`ModelSettings.tool_choice`][agents.model_settings.ModelSettings.tool_choice]. Valid values are: + +1. `auto`, which allows the LLM to decide whether or not to use a tool. +2. `required`, which requires the LLM to use a tool (but it can intelligently decide which tool). +3. `none`, which requires the LLM to _not_ use a tool. +4. Setting a specific string e.g. `my_tool`, which requires the LLM to use that specific tool. + +```python +from agents import Agent, Runner, function_tool, ModelSettings + +@function_tool +def get_weather(city: str) -> str: + """Returns weather info for the specified city.""" + return f"The weather in {city} is sunny" + +agent = Agent( + name="Weather Agent", + instructions="Retrieve weather details.", + tools=[get_weather], + model_settings=ModelSettings(tool_choice="get_weather") +) +``` + +## Tool Use Behavior + +The `tool_use_behavior` parameter in the `Agent` configuration controls how tool outputs are handled: + +- `"run_llm_again"`: The default. Tools are run, and the LLM processes the results to produce a final response. +- `"stop_on_first_tool"`: The output of the first tool call is used as the final response, without further LLM processing. + +```python +from agents import Agent, Runner, function_tool, ModelSettings + +@function_tool +def get_weather(city: str) -> str: + """Returns weather info for the specified city.""" + return f"The weather in {city} is sunny" + +agent = Agent( + name="Weather Agent", + instructions="Retrieve weather details.", + tools=[get_weather], + tool_use_behavior="stop_on_first_tool" +) +``` + +- `StopAtTools(stop_at_tool_names=[...])`: Stops if any specified tool is called, using its output as the final response. + +```python +from agents import Agent, Runner, function_tool +from agents.agent import StopAtTools + +@function_tool +def get_weather(city: str) -> str: + """Returns weather info for the specified city.""" + return f"The weather in {city} is sunny" + +@function_tool +def sum_numbers(a: int, b: int) -> int: + """Adds two numbers.""" + return a + b + +agent = Agent( + name="Stop At Stock Agent", + instructions="Get weather or sum numbers.", + tools=[get_weather, sum_numbers], + tool_use_behavior=StopAtTools(stop_at_tool_names=["get_weather"]) +) +``` + +- `ToolsToFinalOutputFunction`: A custom function that processes tool results and decides whether to stop or continue with the LLM. + +```python +from agents import Agent, Runner, function_tool, FunctionToolResult, RunContextWrapper +from agents.agent import ToolsToFinalOutputResult +from typing import List, Any + +@function_tool +def get_weather(city: str) -> str: + """Returns weather info for the specified city.""" + return f"The weather in {city} is sunny" + +def custom_tool_handler( + context: RunContextWrapper[Any], + tool_results: List[FunctionToolResult] +) -> ToolsToFinalOutputResult: + """Processes tool results to decide final output.""" + for result in tool_results: + if result.output and "sunny" in result.output: + return ToolsToFinalOutputResult( + is_final_output=True, + final_output=f"Final weather: {result.output}" + ) + return ToolsToFinalOutputResult( + is_final_output=False, + final_output=None + ) + +agent = Agent( + name="Weather Agent", + instructions="Retrieve weather details.", + tools=[get_weather], + tool_use_behavior=custom_tool_handler +) +``` + +!!! note + + To prevent infinite loops, the framework automatically resets `tool_choice` to "auto" after a tool call. This behavior is configurable via [`agent.reset_tool_choice`][agents.agent.Agent.reset_tool_choice]. The infinite loop is because tool results are sent to the LLM, which then generates another tool call because of `tool_choice`, ad infinitum. diff --git a/docs/models/litellm.md b/docs/models/litellm.md index 08263feef..5c8aecfb4 100644 --- a/docs/models/litellm.md +++ b/docs/models/litellm.md @@ -1,90 +1,125 @@ -# Using any model via LiteLLM - -!!! note - - The LiteLLM integration is in beta. You may run into issues with some model providers, especially smaller ones. Please report any issues via [Github issues](https://github.com/openai/openai-agents-python/issues) and we'll fix quickly. - -[LiteLLM](https://docs.litellm.ai/docs/) is a library that allows you to use 100+ models via a single interface. We've added a LiteLLM integration to allow you to use any AI model in the Agents SDK. - -## Setup - -You'll need to ensure `litellm` is available. You can do this by installing the optional `litellm` dependency group: - -```bash -pip install "openai-agents[litellm]" -``` - -Once done, you can use [`LitellmModel`][agents.extensions.models.litellm_model.LitellmModel] in any agent. - -## Example - -This is a fully working example. When you run it, you'll be prompted for a model name and API key. For example, you could enter: - -- `openai/gpt-4.1` for the model, and your OpenAI API key -- `anthropic/claude-3-5-sonnet-20240620` for the model, and your Anthropic API key -- etc - -For a full list of models supported in LiteLLM, see the [litellm providers docs](https://docs.litellm.ai/docs/providers). - -```python -from __future__ import annotations - -import asyncio - -from agents import Agent, Runner, function_tool, set_tracing_disabled -from agents.extensions.models.litellm_model import LitellmModel - -@function_tool -def get_weather(city: str): - print(f"[debug] getting weather for {city}") - return f"The weather in {city} is sunny." - - -async def main(model: str, api_key: str): - agent = Agent( - name="Assistant", - instructions="You only respond in haikus.", - model=LitellmModel(model=model, api_key=api_key), - tools=[get_weather], - ) - - result = await Runner.run(agent, "What's the weather in Tokyo?") - print(result.final_output) - - -if __name__ == "__main__": - # First try to get model/api key from args - import argparse - - parser = argparse.ArgumentParser() - parser.add_argument("--model", type=str, required=False) - parser.add_argument("--api-key", type=str, required=False) - args = parser.parse_args() - - model = args.model - if not model: - model = input("Enter a model name for Litellm: ") - - api_key = args.api_key - if not api_key: - api_key = input("Enter an API key for Litellm: ") - - asyncio.run(main(model, api_key)) -``` - -## Tracking usage data - -If you want LiteLLM responses to populate the Agents SDK usage metrics, pass `ModelSettings(include_usage=True)` when creating your agent. - -```python -from agents import Agent, ModelSettings -from agents.extensions.models.litellm_model import LitellmModel - -agent = Agent( - name="Assistant", - model=LitellmModel(model="your/model", api_key="..."), - model_settings=ModelSettings(include_usage=True), -) -``` - -With `include_usage=True`, LiteLLM requests report token and request counts through `result.context_wrapper.usage` just like the built-in OpenAI models. +# Using any model via LiteLLM + +!!! note + + The LiteLLM integration is in beta. You may run into issues with some model providers, especially smaller ones. Please report any issues via [Github issues](https://github.com/openai/openai-agents-python/issues) and we'll fix quickly. + +[LiteLLM](https://docs.litellm.ai/docs/) is a library that allows you to use 100+ models via a single interface. We've added a LiteLLM integration to allow you to use any AI model in the Agents SDK. + +## Setup + +You'll need to ensure `litellm` is available. You can do this by installing the optional `litellm` dependency group: + +```bash +pip install "openai-agents[litellm]" +``` + +Once done, you can use [`LitellmModel`][agents.extensions.models.litellm_model.LitellmModel] in any agent. + +## Example + +This is a fully working example. When you run it, you'll be prompted for a model name and API key. For example, you could enter: + +- `openai/gpt-4.1` for the model, and your OpenAI API key +- `anthropic/claude-3-5-sonnet-20240620` for the model, and your Anthropic API key +- etc + +For a full list of models supported in LiteLLM, see the [litellm providers docs](https://docs.litellm.ai/docs/providers). + +```python +from __future__ import annotations + +import asyncio + +from agents import Agent, Runner, function_tool, set_tracing_disabled +from agents.extensions.models.litellm_model import LitellmModel + +@function_tool +def get_weather(city: str): + print(f"[debug] getting weather for {city}") + return f"The weather in {city} is sunny." + + +async def main(model: str, api_key: str): + agent = Agent( + name="Assistant", + instructions="You only respond in haikus.", + model=LitellmModel(model=model, api_key=api_key), + tools=[get_weather], + ) + + result = await Runner.run(agent, "What's the weather in Tokyo?") + print(result.final_output) + + +if __name__ == "__main__": + # First try to get model/api key from args + import argparse + + parser = argparse.ArgumentParser() + parser.add_argument("--model", type=str, required=False) + parser.add_argument("--api-key", type=str, required=False) + args = parser.parse_args() + + model = args.model + if not model: + model = input("Enter a model name for Litellm: ") + + api_key = args.api_key + if not api_key: + api_key = input("Enter an API key for Litellm: ") + + asyncio.run(main(model, api_key)) +``` + +## Tracking usage data + +If you want LiteLLM responses to populate the Agents SDK usage metrics, pass `ModelSettings(include_usage=True)` when creating your agent. + +```python +from agents import Agent, ModelSettings +from agents.extensions.models.litellm_model import LitellmModel + +agent = Agent( + name="Assistant", + model=LitellmModel(model="your/model", api_key="..."), + model_settings=ModelSettings(include_usage=True), +) +``` + +With `include_usage=True`, LiteLLM requests report token and request counts through `result.context_wrapper.usage` just like the built-in OpenAI models. + +## Using tools with structured outputs + +Some models accessed via LiteLLM (particularly Google Gemini) don't natively support using tools and structured outputs simultaneously. For these models, enable prompt injection: + +```python +from pydantic import BaseModel +from agents import Agent, function_tool +from agents.extensions.models.litellm_model import LitellmModel + + +class Report(BaseModel): + summary: str + confidence: float + + +@function_tool +def analyze_data(query: str) -> dict: + return {"result": f"Analysis of {query}"} + + +agent = Agent( + name="Analyst", + model=LitellmModel( + "gemini/gemini-2.5-flash", + enable_structured_output_with_tools=True, # Required for Gemini + ), + tools=[analyze_data], + output_type=Report, +) +``` + +The `enable_structured_output_with_tools` parameter on `LitellmModel` enables a workaround that injects JSON formatting instructions into the system prompt instead of using the native API. This allows models like Gemini to return structured outputs even when using tools. + +See the [prompt injection documentation](structured_output_with_tools.md) for complete details. diff --git a/docs/models/structured_output_with_tools.md b/docs/models/structured_output_with_tools.md new file mode 100644 index 000000000..899a086b3 --- /dev/null +++ b/docs/models/structured_output_with_tools.md @@ -0,0 +1,241 @@ +# Prompt Injection for Structured Outputs + +Some LLM providers don't natively support using tools and structured outputs simultaneously. The Agents SDK includes an opt-in prompt injection feature to work around this limitation. + +!!! note + + This feature is specifically designed for models accessed via [`LitellmModel`][agents.extensions.models.litellm_model.LitellmModel], particularly **Google Gemini**. OpenAI models have native support and don't need this workaround. + +## The Problem + +Models like Google Gemini don't support using `tools` and `response_schema` (structured output) in the same API call. When you try: + +```python +from agents import Agent, function_tool +from agents.extensions.models.litellm_model import LitellmModel +from pydantic import BaseModel + +class WeatherReport(BaseModel): + city: str + temperature: float + +@function_tool +def get_weather(city: str) -> dict: + return {"city": city, "temperature": 22.5} + +# This causes an error with Gemini +agent = Agent( + model=LitellmModel("gemini/gemini-2.5-flash"), + tools=[get_weather], + output_type=WeatherReport, # Error: can't use both! +) +``` + +You'll get an error like: + +``` +GeminiException BadRequestError - Function calling with a response mime type +'application/json' is unsupported +``` + +## The Solution + +Enable prompt injection by setting `enable_structured_output_with_tools=True` on the `LitellmModel`: + +```python +agent = Agent( + model=LitellmModel( + "gemini/gemini-2.5-flash", + enable_structured_output_with_tools=True, # ← Enables the workaround + ), + tools=[get_weather], + output_type=WeatherReport, +) +``` + +When enabled, the SDK: + +1. Generates JSON formatting instructions from your Pydantic model. +2. Injects these instructions into the system prompt. +3. Disables the native `response_format` parameter to avoid API errors. +4. Parses the model's JSON response into your Pydantic model. + +## Complete Example + +```python +from __future__ import annotations + +import asyncio +from pydantic import BaseModel, Field + +from agents import Agent, Runner, function_tool +from agents.extensions.models.litellm_model import LitellmModel + + +class WeatherReport(BaseModel): + city: str = Field(description="The city name") + temperature: float = Field(description="Temperature in Celsius") + conditions: str = Field(description="Weather conditions") + + +@function_tool +def get_weather(city: str) -> dict: + """Get current weather for a city.""" + return { + "city": city, + "temperature": 22.5, + "conditions": "sunny", + } + + +async def main(): + agent = Agent( + name="WeatherBot", + instructions="Use the get_weather tool, then provide a structured report.", + model=LitellmModel( + "gemini/gemini-2.5-flash", + enable_structured_output_with_tools=True, # Required for Gemini + ), + tools=[get_weather], + output_type=WeatherReport, + ) + + result = await Runner.run(agent, "What's the weather in Tokyo?") + + # Result is properly typed as WeatherReport + report: WeatherReport = result.final_output + print(f"City: {report.city}") + print(f"Temperature: {report.temperature}") + print(f"Conditions: {report.conditions}") + + +if __name__ == "__main__": + asyncio.run(main()) +``` + +## When to Use + +| Model Provider | Access Via | Need `enable_structured_output_with_tools`? | +|----------------|-----------|------------------------------| +| Google Gemini | [`LitellmModel("gemini/...")`][agents.extensions.models.litellm_model.LitellmModel] | **Yes** - No native support | +| OpenAI | `"gpt-4o"` (default) | **No** - Has native support | +| Anthropic Claude | [`LitellmModel("claude-...")`][agents.extensions.models.litellm_model.LitellmModel] | **No** - Has native support | +| Other LiteLLM models | [`LitellmModel`][agents.extensions.models.litellm_model.LitellmModel] | **Try without first** | + +!!! tip + + If you're using [`LitellmModel`][agents.extensions.models.litellm_model.LitellmModel] and getting errors when combining tools with structured outputs, set `enable_structured_output_with_tools=True`. + +## How It Works + +### Without Prompt Injection (Default) + +The SDK uses the model's native structured output API: + +```python +# API request +{ + "tools": [...], + "response_format": {"type": "json_schema", ...} +} +``` + +This works for OpenAI and Anthropic models but fails for Gemini. + +### With Prompt Injection + +The SDK modifies the request: + +```python +# API request +{ + "system_instruction": "......", + "tools": [...], + "response_format": None # Disabled to avoid errors +} +``` + +The injected instructions tell the model: + +- Which JSON fields to output. +- The type and description of each field. +- How to format the response (valid JSON only). + +### Example Injected Instructions + +For the `WeatherReport` model above, the SDK injects: + +``` +Provide your output as a JSON object containing the following fields: + +["city", "temperature", "conditions"] + + +Here are the properties for each field: + +{ + "city": { + "description": "The city name", + "type": "string" + }, + "temperature": { + "description": "Temperature in Celsius", + "type": "number" + }, + "conditions": { + "description": "Weather conditions", + "type": "string" + } +} + + +IMPORTANT: +- Start your response with `{` and end it with `}` +- Your output will be parsed with json.loads() +- Make sure it only contains valid JSON +- Do NOT include markdown code blocks or any other formatting +``` + +## Debugging + +Enable debug logging to see when prompt injection is active: + +```python +import logging +logging.basicConfig(level=logging.DEBUG) +``` + +Look for: + +``` +DEBUG: Injected JSON output prompt for structured output with tools +``` + +## Best Practices + +1. **Use Pydantic Field descriptions**: The SDK uses these to generate better instructions. + + ```python + class Report(BaseModel): + # Good - includes description + score: float = Field(description="Confidence score from 0 to 1") + + # Less helpful - no description + count: int + ``` + +2. **Test without prompt injection first**: Only enable it if you get errors. + +3. **Use with LiteLLM models only**: OpenAI models ignore this parameter. + +## Limitations + +- The model must be able to follow JSON formatting instructions reliably. +- Parsing errors can occur if the model doesn't output valid JSON. +- This is a workaround, not a replacement for native API support. + +## Related Documentation + +- [Agents](../agents.md) - General agent configuration. +- [LiteLLM models](litellm.md) - Using any model via LiteLLM. +- [Tools](../tools.md) - Defining and using tools. diff --git a/mkdocs.yml b/mkdocs.yml index a1ed06d31..3e76fd318 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -76,6 +76,7 @@ plugins: - Models: - models/index.md - models/litellm.md + - models/structured_output_with_tools.md - config.md - visualization.md - release.md diff --git a/src/agents/extensions/models/litellm_model.py b/src/agents/extensions/models/litellm_model.py index 6389b38b2..f3d13aa24 100644 --- a/src/agents/extensions/models/litellm_model.py +++ b/src/agents/extensions/models/litellm_model.py @@ -51,6 +51,7 @@ from ...tracing.spans import Span from ...usage import Usage from ...util._json import _to_dump_compatible +from ...util._prompts import get_json_output_prompt, should_inject_json_prompt class InternalChatCompletionMessage(ChatCompletionMessage): @@ -73,10 +74,12 @@ def __init__( model: str, base_url: str | None = None, api_key: str | None = None, + enable_structured_output_with_tools: bool = False, ): self.model = model self.base_url = base_url self.api_key = api_key + self.enable_structured_output_with_tools = enable_structured_output_with_tools async def get_response( self, @@ -88,7 +91,7 @@ async def get_response( handoffs: list[Handoff], tracing: ModelTracing, previous_response_id: str | None = None, # unused - conversation_id: str | None = None, # unused + conversation_id: str | None = None, prompt: Any | None = None, ) -> ModelResponse: with generation_span( @@ -192,7 +195,7 @@ async def stream_response( handoffs: list[Handoff], tracing: ModelTracing, previous_response_id: str | None = None, # unused - conversation_id: str | None = None, # unused + conversation_id: str | None = None, prompt: Any | None = None, ) -> AsyncIterator[TResponseStreamEvent]: with generation_span( @@ -287,6 +290,23 @@ async def _fetch_response( if "anthropic" in self.model.lower() or "claude" in self.model.lower(): converted_messages = self._fix_tool_message_ordering(converted_messages) + # Check if we need to inject JSON output prompt for models that don't support + # tools + structured output simultaneously (like Gemini) + # Note: handoffs are converted to function tools, so we need to include them in the check + tools_and_handoffs = list(tools) if tools else [] + if handoffs: + tools_and_handoffs.extend(handoffs) + inject_json_prompt = should_inject_json_prompt( + output_schema, tools_and_handoffs, self.enable_structured_output_with_tools + ) + if inject_json_prompt and output_schema: + json_prompt = get_json_output_prompt(output_schema) + if system_instructions: + system_instructions = f"{system_instructions}\n\n{json_prompt}" + else: + system_instructions = json_prompt + logger.debug("Injected JSON output prompt for structured output with tools") + if system_instructions: converted_messages.insert( 0, @@ -308,7 +328,12 @@ async def _fetch_response( else None ) tool_choice = Converter.convert_tool_choice(model_settings.tool_choice) - response_format = Converter.convert_response_format(output_schema) + # Don't use response_format if we injected JSON prompt (avoids API errors) + response_format = ( + Converter.convert_response_format(None) + if inject_json_prompt + else Converter.convert_response_format(output_schema) + ) converted_tools = [Converter.tool_to_openai(tool) for tool in tools] if tools else [] diff --git a/src/agents/util/_prompts.py b/src/agents/util/_prompts.py new file mode 100644 index 000000000..df20f4696 --- /dev/null +++ b/src/agents/util/_prompts.py @@ -0,0 +1,117 @@ +"""Utility functions for generating prompts for structured outputs.""" + +import json +import logging +from typing import Any + +from ..agent_output import AgentOutputSchemaBase + +logger = logging.getLogger(__name__) + + +def get_json_output_prompt(output_schema: AgentOutputSchemaBase) -> str: + if output_schema.is_plain_text(): + return "" + + json_output_prompt = "\n\nProvide your output as a JSON object containing the following fields:" + + try: + json_schema = output_schema.json_schema() + + # Extract field names and properties + response_model_properties = {} + json_schema_properties = json_schema.get("properties", {}) + + for field_name, field_properties in json_schema_properties.items(): + formatted_field_properties = { + prop_name: prop_value + for prop_name, prop_value in field_properties.items() + if prop_name != "title" + } + + # Handle enum references + if "allOf" in formatted_field_properties: + ref = formatted_field_properties["allOf"][0].get("$ref", "") + if ref.startswith("#/$defs/"): + enum_name = ref.split("/")[-1] + formatted_field_properties["enum_type"] = enum_name + + response_model_properties[field_name] = formatted_field_properties + + # Handle definitions (nested objects, enums, etc.) + json_schema_defs = json_schema.get("$defs") + if json_schema_defs is not None: + response_model_properties["$defs"] = {} + for def_name, def_properties in json_schema_defs.items(): + if "enum" in def_properties: + # Enum definition + response_model_properties["$defs"][def_name] = { + "type": "string", + "enum": def_properties["enum"], + "description": def_properties.get("description", ""), + } + else: + # Regular object definition + def_fields = def_properties.get("properties") + formatted_def_properties = {} + if def_fields is not None: + for field_name, field_properties in def_fields.items(): + formatted_field_properties = { + prop_name: prop_value + for prop_name, prop_value in field_properties.items() + if prop_name != "title" + } + formatted_def_properties[field_name] = formatted_field_properties + if len(formatted_def_properties) > 0: + response_model_properties["$defs"][def_name] = formatted_def_properties + + if len(response_model_properties) > 0: + # List field names + field_names = [key for key in response_model_properties.keys() if key != "$defs"] + json_output_prompt += "\n" + json_output_prompt += f"\n{json.dumps(field_names)}" + json_output_prompt += "\n" + + # Provide detailed properties + json_output_prompt += "\n\nHere are the properties for each field:" + json_output_prompt += "\n" + json_output_prompt += f"\n{json.dumps(response_model_properties, indent=2)}" + json_output_prompt += "\n" + + except (AttributeError, KeyError, TypeError, ValueError) as e: + # Fallback to simple instruction if schema generation fails + logger.warning( + f"Failed to generate detailed JSON schema for prompt injection: {e}. " + f"Using simple fallback for output type: {output_schema.name()}" + ) + json_output_prompt += f"\nOutput type: {output_schema.name()}" + except Exception as e: + # Catch any other unexpected errors but log them as errors + logger.error( + f"Unexpected error generating JSON prompt for {output_schema.name()}: {e}", + exc_info=True, + ) + json_output_prompt += f"\nOutput type: {output_schema.name()}" + + json_output_prompt += "\n\nIMPORTANT:" + json_output_prompt += "\n- Start your response with `{` and end it with `}`" + json_output_prompt += "\n- Your output will be parsed with json.loads()" + json_output_prompt += "\n- Make sure it only contains valid JSON" + json_output_prompt += "\n- Do NOT include markdown code blocks or any other formatting" + + return json_output_prompt + + +def should_inject_json_prompt( + output_schema: AgentOutputSchemaBase | None, + tools: list[Any], + enable_structured_output_with_tools: bool = False, +) -> bool: + if output_schema is None or output_schema.is_plain_text(): + return False + + # Only inject if explicitly requested by user AND both tools and output_schema are present + if enable_structured_output_with_tools and tools and len(tools) > 0: + return True + + return False diff --git a/tests/test_gemini_local.py b/tests/test_gemini_local.py new file mode 100644 index 000000000..6d1d2b309 --- /dev/null +++ b/tests/test_gemini_local.py @@ -0,0 +1,176 @@ +""" +Test script for Gemini with prompt injection feature. +Run this locally to test the implementation with your own API key. + +Usage: +1. Set your API key: export GOOGLE_API_KEY=your_key_here +2. Run: python test_gemini_local.py +""" + +import asyncio +import logging +import os +from typing import Any + +from pydantic import BaseModel + +from agents import Agent, function_tool +from agents.extensions.models.litellm_model import LitellmModel + +# Enable logging to see the final system prompt sent to Gemini +logging.basicConfig(level=logging.INFO, format="%(message)s") + + +# Define your output schema +class WeatherReport(BaseModel): + """Weather report structure.""" + + city: str + temperature: float + conditions: str + humidity: int + + +# Define a simple tool +@function_tool +def get_weather(city: str) -> dict[str, Any]: + """Get the current weather for a city.""" + # Mock weather data + weather_data = { + "Tokyo": {"temperature": 22.5, "conditions": "sunny", "humidity": 65}, + "London": {"temperature": 15.0, "conditions": "rainy", "humidity": 80}, + "New York": {"temperature": 18.0, "conditions": "cloudy", "humidity": 70}, + } + + data = weather_data.get(city, {"temperature": 20.0, "conditions": "unknown", "humidity": 60}) + data["city"] = city + return data + + +async def main(): + """Main test function.""" + + # Check for API key + if not os.getenv("GOOGLE_API_KEY"): + print("ERROR: GOOGLE_API_KEY environment variable not set!") + print("\nTo set it:") + print(" Windows PowerShell: $env:GOOGLE_API_KEY='your_key_here'") + print(" Windows CMD: set GOOGLE_API_KEY=your_key_here") + print(" Linux/Mac: export GOOGLE_API_KEY=your_key_here") + return + + print("=" * 80) + print("Testing Gemini with Prompt Injection Feature") + print("=" * 80) + print("\n🔍 The final system prompt sent to Gemini will be shown below") + print("=" * 80) + + # Create agent with prompt injection enabled on the model + agent = Agent( + name="weather_assistant", + instructions=( + "You are a helpful weather assistant. Use the get_weather tool to " + "fetch weather information, then provide a structured report." + ), + model=LitellmModel( + "gemini/gemini-2.5-flash", + enable_structured_output_with_tools=True, # CRITICAL: Enable for Gemini! + ), + tools=[get_weather], + output_type=WeatherReport, + ) + + print("\nAgent Configuration:") + print(" Model: gemini/gemini-2.5-flash") + print(f" Tools: {[tool.name for tool in agent.tools]}") + print(" Output Type: WeatherReport") + # Type check: ensure agent.model is LitellmModel + if isinstance(agent.model, LitellmModel): + print( + f" enable_structured_output_with_tools: " + f"{agent.model.enable_structured_output_with_tools}" + ) + + print(f"\n{'=' * 80}") + print("Running agent with input: 'What's the weather in Tokyo?'") + print(f"{'=' * 80}\n") + + print("📤 Sending request to Gemini...") + print("⏳ Waiting for response...\n") + + try: + from agents import Runner + + result = await Runner.run( + starting_agent=agent, + input="What's the weather in Tokyo?", + ) + + print("\n✅ Agent execution completed!") + + print(f"\n{'=' * 80}") + print("🎉 SUCCESS! Response Received") + print(f"{'=' * 80}") + + print("\n📊 Result Analysis:") + print(f"{'=' * 80}") + print(f"Output Type: {type(result.final_output).__name__}") + print(f"Output Value: {result.final_output}") + print(f"{'=' * 80}") + + if isinstance(result.final_output, WeatherReport): + print("\n✅ STRUCTURED OUTPUT PARSING: SUCCESS!") + print(f"{'=' * 80}") + print("\n📋 Weather Report (Parsed from JSON):") + print(f"{'=' * 80}") + print(f" 🌍 City: {result.final_output.city}") + print(f" 🌡️ Temperature: {result.final_output.temperature}°C") + print(f" ☁️ Conditions: {result.final_output.conditions}") + print(f" 💧 Humidity: {result.final_output.humidity}%") + print(f"{'=' * 80}") + else: + print( + f"\n⚠️ WARNING: Output type is {type(result.final_output)}, expected WeatherReport" + ) + + print("\n📈 Token Usage:") + print(f"{'=' * 80}") + print(f" 📥 Input tokens: {result.context_wrapper.usage.input_tokens}") + print(f" 📤 Output tokens: {result.context_wrapper.usage.output_tokens}") + print(f" 📊 Total tokens: {result.context_wrapper.usage.total_tokens}") + print(f"{'=' * 80}") + + print("\n💡 What Happened:") + print(" 1. ✅ Prompt injection added JSON schema to system prompt") + print(" 2. ✅ Gemini called get_weather tool") + print(" 3. ✅ Gemini returned structured JSON matching WeatherReport schema") + print(" 4. ✅ SDK parsed JSON into WeatherReport Pydantic model") + print("\n🎯 Feature is working correctly!") + + except Exception as e: + print(f"\n{'=' * 80}") + print("❌ ERROR!") + print(f"{'=' * 80}") + print(f"\n💥 Error: {e}") + print("\n🔧 Troubleshooting Steps:") + print(f"{'=' * 80}") + print(" 1. ✓ Check your API key is valid") + print(" 2. ✓ Ensure litellm is installed: pip install 'openai-agents[litellm]'") + print(" 3. ✓ Check internet connection") + print(" 4. ✓ Check DEBUG logs above for prompt details") + print(f"{'=' * 80}") + + import traceback + + print("\n📋 Full traceback:") + print(f"{'=' * 80}") + traceback.print_exc() + print(f"{'=' * 80}") + + +if __name__ == "__main__": + print("\n" + "=" * 80) + print("Gemini + Prompt Injection Test") + print("=" * 80 + "\n") + + asyncio.run(main()) diff --git a/tests/utils/test_prompts.py b/tests/utils/test_prompts.py new file mode 100644 index 000000000..cc277bcfc --- /dev/null +++ b/tests/utils/test_prompts.py @@ -0,0 +1,118 @@ +from pydantic import BaseModel + +from agents.agent_output import AgentOutputSchema +from agents.util._prompts import get_json_output_prompt, should_inject_json_prompt + + +class SimpleModel(BaseModel): + name: str + age: int + + +class NestedModel(BaseModel): + user: SimpleModel + active: bool + + +def test_get_json_output_prompt_returns_empty_for_plain_text(): + schema = AgentOutputSchema(str) + result = get_json_output_prompt(schema) + assert result == "" + + +def test_get_json_output_prompt_with_simple_schema(): + schema = AgentOutputSchema(SimpleModel) + result = get_json_output_prompt(schema) + assert "name" in result + assert "age" in result + assert "JSON" in result + + +def test_get_json_output_prompt_with_nested_schema(): + schema = AgentOutputSchema(NestedModel) + result = get_json_output_prompt(schema) + assert "user" in result + assert "active" in result + assert "JSON" in result + + +def test_get_json_output_prompt_handles_schema_error(): + schema = AgentOutputSchema(SimpleModel) + result = get_json_output_prompt(schema) + assert isinstance(result, str) + assert len(result) > 0 + + +def test_should_inject_json_prompt_default_false(): + schema = AgentOutputSchema(SimpleModel) + tools = [{"type": "function", "name": "test"}] + result = should_inject_json_prompt(schema, tools) + assert result is False + + +def test_should_inject_json_prompt_explicit_opt_in(): + schema = AgentOutputSchema(SimpleModel) + tools = [{"type": "function", "name": "test"}] + result = should_inject_json_prompt(schema, tools, enable_structured_output_with_tools=True) + assert result is True + + +def test_should_inject_json_prompt_no_schema(): + result = should_inject_json_prompt( + None, [{"type": "function"}], enable_structured_output_with_tools=True + ) + assert result is False + + +def test_should_inject_json_prompt_plain_text_schema(): + schema = AgentOutputSchema(str) + tools = [{"type": "function"}] + result = should_inject_json_prompt(schema, tools, enable_structured_output_with_tools=True) + assert result is False + + +def test_should_inject_json_prompt_no_tools(): + schema = AgentOutputSchema(SimpleModel) + result = should_inject_json_prompt(schema, [], enable_structured_output_with_tools=True) + assert result is False + + +def test_should_inject_json_prompt_empty_tools(): + schema = AgentOutputSchema(SimpleModel) + result = should_inject_json_prompt(schema, [], enable_structured_output_with_tools=True) + assert result is False + + +def test_should_inject_json_prompt_all_conditions_met(): + schema = AgentOutputSchema(SimpleModel) + tools = [{"type": "function", "name": "test"}] + result = should_inject_json_prompt(schema, tools, enable_structured_output_with_tools=True) + assert result is True + + +def test_should_inject_json_prompt_without_opt_in(): + schema = AgentOutputSchema(SimpleModel) + tools = [{"type": "function", "name": "test"}] + result = should_inject_json_prompt(schema, tools, enable_structured_output_with_tools=False) + assert result is False + + +def test_should_inject_json_prompt_multiple_tools(): + schema = AgentOutputSchema(SimpleModel) + tools = [ + {"type": "function", "name": "test1"}, + {"type": "function", "name": "test2"}, + ] + result = should_inject_json_prompt(schema, tools, enable_structured_output_with_tools=True) + assert result is True + + +def test_should_inject_json_prompt_with_handoffs_as_tools(): + """Test that handoffs (passed as tools) trigger injection when enabled.""" + schema = AgentOutputSchema(SimpleModel) + # Simulate handoffs being passed in the tools list + handoffs_as_tools = [{"type": "function", "name": "handoff_to_agent"}] + result = should_inject_json_prompt( + schema, handoffs_as_tools, enable_structured_output_with_tools=True + ) + assert result is True