|
| 1 | +# Understanding AI Agents through the Thought-Action-Observation Cycle |
| 2 | + |
| 3 | +<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-check-3.jpg" alt="Unit 1 planning"/> |
| 4 | + |
| 5 | +In the previous sections, we learned: |
| 6 | + |
| 7 | +- **How tools are made available to the agent in the system prompt**. |
| 8 | +- **How AI agents are systems that can 'reason', plan, and interact with their environment**. |
| 9 | + |
| 10 | +In this section, **we’ll explore the complete AI Agent Workflow**, a cycle we defined as Thought-Action-Observation. |
| 11 | + |
| 12 | +And then, we’ll dive deeper on each of these steps. |
| 13 | + |
| 14 | + |
| 15 | +## The Core Components |
| 16 | + |
| 17 | +Agents work in a continuous cycle of: **thinking (Thought) → acting (Act) and observing (Observe)**. |
| 18 | + |
| 19 | +Let’s break down these actions together: |
| 20 | + |
| 21 | +1. **Thought**: The LLM part of the Agent decides what the next step should be. |
| 22 | +2. **Action:** The agent takes an action, by calling the tools with the associated arguments. |
| 23 | +3. **Observation:** The model reflects on the response from the tool. |
| 24 | + |
| 25 | +## The Thought-Action-Observation Cycle |
| 26 | + |
| 27 | +The three components work together in a continuous loop. To use an analogy from programming, the agent uses a **while loop**: the loop continues until the objective of the agent has been fulfilled. |
| 28 | + |
| 29 | +Visually, it looks like this: |
| 30 | + |
| 31 | +<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AgentCycle.gif" alt="Think, Act, Observe cycle"/> |
| 32 | + |
| 33 | +In many Agent frameworks, **the rules and guidelines are embedded directly into the system prompt**, ensuring that every cycle adheres to a defined logic. |
| 34 | + |
| 35 | +In a simplified version, our system prompt may look like this: |
| 36 | + |
| 37 | +<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/system_prompt_cycle.png" alt="Think, Act, Observe cycle"/> |
| 38 | + |
| 39 | +We see here that in the System Message we defined : |
| 40 | + |
| 41 | +- The *Agent's behavior*. |
| 42 | +- The *Tools our Agent has access to*, as we described in the previous section. |
| 43 | +- The *Thought-Action-Observation Cycle*, that we bake into the LLM instructions. |
| 44 | + |
| 45 | +Let’s take a small example to understand the process before going deeper into each step of the process. |
| 46 | + |
| 47 | +## Alfred, the weather Agent |
| 48 | + |
| 49 | +We created Alfred, the Weather Agent. |
| 50 | + |
| 51 | +A user asks Alfred: “What’s the current weather in New York?” |
| 52 | + |
| 53 | +<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent.jpg" alt="Alfred Agent"/> |
| 54 | + |
| 55 | +Alfred’s job is to answer this query using a weather API tool. |
| 56 | + |
| 57 | +Here’s how the cycle unfolds: |
| 58 | + |
| 59 | +### Thought |
| 60 | + |
| 61 | +**Internal Reasoning:** |
| 62 | + |
| 63 | +Upon receiving the query, Alfred’s internal dialogue might be: |
| 64 | + |
| 65 | +*"The user needs current weather information for New York. I have access to a tool that fetches weather data. First, I need to call the weather API to get up-to-date details."* |
| 66 | + |
| 67 | +This step shows the agent breaking the problem into steps: first, gathering the necessary data. |
| 68 | + |
| 69 | +<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-1.jpg" alt="Alfred Agent"/> |
| 70 | + |
| 71 | +### Action |
| 72 | + |
| 73 | +**Tool Usage:** |
| 74 | + |
| 75 | +Based on its reasoning and the fact that Alfred knows about a `get_weather` tool, Alfred prepares a JSON-formatted command that calls the weather API tool. For example, its first action could be: |
| 76 | + |
| 77 | +Thought: I need to check the current weather for New York. |
| 78 | + |
| 79 | + ``` |
| 80 | + { |
| 81 | + "action": "get_weather", |
| 82 | + "action_input": { |
| 83 | + "location": "New York" |
| 84 | + } |
| 85 | + } |
| 86 | + ``` |
| 87 | + |
| 88 | +Here, the action clearly specifies which tool to call (e.g., get_weather) and what parameter to pass (the “location": “New York”). |
| 89 | + |
| 90 | +<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-2.jpg" alt="Alfred Agent"/> |
| 91 | + |
| 92 | +### Observation |
| 93 | + |
| 94 | +**Feedback from the Environment:** |
| 95 | + |
| 96 | +After the tool call, Alfred receives an observation. This might be the raw weather data from the API such as: |
| 97 | + |
| 98 | +*"Current weather in New York: partly cloudy, 15°C, 60% humidity."* |
| 99 | + |
| 100 | +<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-3.jpg" alt="Alfred Agent"/> |
| 101 | + |
| 102 | +This observation is then added to the prompt as additional context. It functions as real-world feedback, confirming whether the action succeeded and providing the needed details. |
| 103 | + |
| 104 | + |
| 105 | +### Updated thought |
| 106 | + |
| 107 | +**Reflecting:** |
| 108 | + |
| 109 | +With the observation in hand, Alfred updates its internal reasoning: |
| 110 | + |
| 111 | +*"Now that I have the weather data for New York, I can compile an answer for the user."* |
| 112 | + |
| 113 | +<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-4.jpg" alt="Alfred Agent"/> |
| 114 | + |
| 115 | + |
| 116 | +### Final Action |
| 117 | + |
| 118 | +Alfred then generates a final response formatted as we told it to: |
| 119 | + |
| 120 | +Thought: I have the weather data now. The current weather in New York is partly cloudy with a temperature of 15°C and 60% humidity." |
| 121 | + |
| 122 | +Final answer : The current weather in New York is partly cloudy with a temperature of 15°C and 60% humidity. |
| 123 | + |
| 124 | +This final action sends the answer back to the user, closing the loop. |
| 125 | + |
| 126 | + |
| 127 | +<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-5.jpg" alt="Alfred Agent"/> |
| 128 | + |
| 129 | + |
| 130 | +What we see in this example: |
| 131 | + |
| 132 | +- **Agents iterate through a loop until the objective is fulfilled:** |
| 133 | + |
| 134 | +**Alfred’s process is cyclical**. It starts with a thought, then acts by calling a tool, and finally observes the outcome. If the observation had indicated an error or incomplete data, Alfred could have re-entered the cycle to correct its approach. |
| 135 | + |
| 136 | +- **Tool Integration:** |
| 137 | + |
| 138 | +The ability to call a tool (like a weather API) enables Alfred to go **beyond static knowledge and retrieve real-time data**, an essential aspect of many AI Agents. |
| 139 | + |
| 140 | +- **Dynamic Adaptation:** |
| 141 | + |
| 142 | +Each cycle allows the agent to incorporate fresh information (observations) into its reasoning (thought), ensuring that the final answer is well-informed and accurate. |
| 143 | + |
| 144 | +This example showcases the core concept behind the *ReAct cycle* (a concept we're going to develop in the next section): **the interplay of Thought, Action, and Observation empowers AI agents to solve complex tasks iteratively**. |
| 145 | + |
| 146 | +By understanding and applying these principles, you can design agents that not only reason about their tasks but also **effectively utilize external tools to complete them**, all while continuously refining their output based on environmental feedback. |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +Let’s now dive deeper into the Thought, Action, Observation as the individual steps of the process. |
0 commit comments