Feat: Execution Implementation #10

Naman2701B · 2025-10-06T14:28:04Z

Implemented the execution logic where the args for the server are obfuscated till the time the specific server does not require the value for running.

JustinCappos

I'm still pretty confused by the code comments and overall design in places. Can you try to clarify things more?

JustinCappos · 2025-10-06T14:59:39Z

src/shardguard/core/coordination.py

+        """
+        Normalize SubPrompt into a real dict.
+        """


I don't know what the purpose of this is. Can you explain what you're doing and why here?

JustinCappos · 2025-10-06T14:59:54Z

src/shardguard/core/coordination.py

+        """
+        Extracting arguments from the prompt for both cases:
+            1. Getting both key-value pairs for the system as a whole
+            2. Getting only the key for the parameter to be obfuscated
+        """


I'm also confused here. What is the purpose of this function?

JustinCappos · 2025-10-06T15:01:00Z

src/shardguard/core/coordination.py

+        """
+        Function to make calls to specific tools specified by the Planning LLM
+        Args:
+            LLMStepResponse: Processed prompts to breakdown specific tasks so that no other MCP knows about each other


I'm confused what this means. Is this one execution LLM or?

JustinCappos · 2025-10-06T15:01:49Z

src/shardguard/core/execution.py

+This file handles the breaking down of a task into single or
+multiple MCP tools based on suggested tools in the sub prompt 
+given by the LLM.
+"""


Isn't this a single execution LLM? Why would this file do the breakdown?

For example, if a subprompt suggested tools - gives 2 tools as response generated by the LLM. This file would handle both the tools because we take the subprompt as a whole.

JustinCappos · 2025-10-06T15:02:08Z

src/shardguard/core/execution.py

+@dataclass
+class LLMStepResponse:
+    """
+    This contains the list of ToolCall, as multiple calls might be needed to make a specific sub prompt run.


What is a ToolCall?

I have referenced this above from the same file, hence did not elaborate on it again.

JustinCappos · 2025-10-06T15:02:36Z

src/shardguard/core/execution.py

+    def __init__(self) -> None:
+        pass
+
+    async def run_step(self, step: Any) -> LLMStepResponse:


What is a "step"?

src/shardguard/core/models.py

Naman2701B

Updated the execution LLM added Gemini Provider similar to that of Planning LLM.
Updated the prompts for PlanningLLM to not include tools, as it makes us lose essence of the ExecutionLLM.
Added validation schemas for both the Planning LLM and Tool Execution.

… do not make sense - suggested by Evan

…see the LLM does not hallucinate

…cks for tool hallucination by the PlanningLLM

annmalavet · 2025-11-03T22:12:53Z

src/shardguard/core/execution.py

+    """
+    This contains only the server and tool required which would be returned,
+    args would not be part of this as the execution LLM is assumed to not be
+    trusted


Can you explain this some more? My understanding of the MCP API is that the MCP responds to a tools/list request and provides all the tools

The MCP server must list all tools for capabilities

https://modelcontextprotocol.io/specification/2025-06-18/server/tools

Is the tool list coming from using the MCP API to request the tools from the MCP Server?

So this is part of the ExecutionLLM, where the tools have already been decided by the planning LLM. This file just handles executing those specific tools that have been suggested by PlanningLLM. Hence, the MCP tool call API is not relevant for this section. The tool call APIs are being handled in the planning section of ShardGuard.

JustinCappos · 2025-11-04T12:58:27Z

src/shardguard/core/coordination.py

+        if (len(suggested_tools)!=0):
+            for tool in suggested_tools:
+                if tool in tools:
+                    return True
+                else:
+                    return False
+        return True


What are you trying to do here? I'm almost certain this isn't what you mean to check. There is no need for that for loop because you exit after looking at suggested_tools[0]. But I think you want to make sure each tool in suggested_tools is in the tools list.

Oh yes, sorry hadn't encountered any scenario with multiple tool calls hence did not realize the case was faulty, it has been taken care of in the upcoming commit.

JustinCappos · 2025-11-04T12:59:11Z

src/shardguard/core/coordination.py

+            # Instantiating a new ExecutionLLM for each task so that none of them have each others context
+            exec_llm = make_execution_llm(provider, detected_model, api_key=api_key)
+            executor = StepExecutor(exec_llm)
+            task = self._to_dict(task)


what is this doing? Why?

As part of our block diagram, we have a new execution LLM at each step for each subtask, this instantiates a new ExecutionLLM each time a subtask is to be executed.
And the last line _to_dict is for normalizing the data structure for the system readability.

Naman2701B · 2025-11-24T20:56:49Z

UPDATED THE PR, this PR contains only -

Execution implementation
Handling coordination service to have execution system being called.

PS: I had to force push to revert changes so that I can go back to previous commit which just included the implementation section of the code.

src/shardguard/cli.py

enjhnsn2 · 2025-12-09T21:47:59Z

src/shardguard/core/prompts.py

 3. **Decompose** the redacted prompt into clear, numbered subtasks.
 4. **Consider available MCP tools** when breaking down tasks - if a task can be accomplished using
-   an available tool, mention the relevant tool in the subtask description.
+   an available tool, mention the relevant tool in the **suggested_tools** section of the subprompt as provided in the output schema.


This is a fine way to get structured outputs, but we could also use model-side constrained decoding to get even better results. Many APIs support this (ollama and openai for sure). Example: https://openai.com/index/introducing-structured-outputs-in-the-api/

In my view, this is better as here - we have control and can use our own schemas to validate the outputs - without us really being dependent on models to make sure it is correct and will not hinder further processing of the subprompts.

enjhnsn2 · 2025-12-09T21:52:50Z

src/shardguard/core/coordination.py

+    def _to_dict(self, obj: Any) -> Dict[str, Any]:
+        """
+        Normalize SubPrompt into a real dict.
+        When the prompt goes to the LLM, it returns a Pydantic model, 


What is a Pydantic model? Presumably this is a python-specific thing, because I've worked with all the major llm APIs and never heard of this.

Yes, it is python specific. You can understand this as something similar to Struct in C, where Pydantic internally manages the type of the parameters included inside the specific model, you can look at this file, which creates this model framework.

enjhnsn2 · 2025-12-09T21:58:10Z

src/shardguard/core/coordination.py

+            if(self.retryCount<=5):
+                self.retryCount+=1
+                logger.warning(f"Retrying Planning LLM due to invalid tool suggestion!")


Is the error handling because of things like network errors, or something more fundamental? Also, can you pull out the 5 here into a constant MAX_RETRIES or something similar?

I will make it to global variable, yes.
Reason that I have this max retries implemented:

If the Planning LLM does not perform fine and ends up hallucinating tool, we do not want it to be in an indefinite loop of continuous retries.
And for now, I had set it specifically to 5, because free version of Gemini has a rate limiter and I was not able to get the model to perform while testing.

enjhnsn2 · 2025-12-09T22:00:39Z

src/shardguard/core/coordination.py

+            # Validating the result from the tool call with the expected schema
+            _validate_output(result, output_schema, where="Tool Call")
+
+            logger.warning(f"{call.server}: {call.tool} was called with the parameters: {per_tool_args}")


Should be a logger.info or logger.debug, not a warning, as nothing is wrong about making a tool call.

I explicitly kept it as logger.warning because for some reason, if I keep it as logger.info - I do not get any output - and there is no way for a user to understand that this actually worked or not.
In my view, we should have this till we are using stub MCP servers and tools. Once, we are working with actual real world servers - we can keep it as logging.info or get rid of it altogether as then the we will find changes in real life working application.

src/shardguard/core/execution.py

enjhnsn2 · 2025-12-09T22:04:00Z

src/shardguard/core/execution.py

+RULES (hard constraints):
+- Use **ONLY tools** listed in "suggested_tools". Zero exceptions.
+- **Do NOT** invent tools, servers, steps, or intermediate IDs.
+- All outputs must be pure JSON. No prose. No code fences.


Do we also have a "hard" constraint on tools that the execution llm can use? For example, if it violated this rule, what would happen?

We do not want the execution LLM to be hallucinating tools at any instance, hence we need to have a hard check on that.
Additionally, lets say it still ends up hallucinating, and gives a server-tool pair name which our coordination service is not aware of, it will end up throwing error as this tool for our MCP context does not exist.

Naman2701B added 4 commits October 6, 2025 10:08

Implementation for tool execution in ShardGuard

34b78ad

Updated Documentation

c25c894

Removed unnecessary imports

414eb6c

Added logging to show coordination service worked fine

60b4edf

JustinCappos requested changes Oct 6, 2025

View reviewed changes

Naman2701B added 2 commits October 6, 2025 11:53

Updated documentation further to address the comments

0cc41cc

Updated the execution LLM implementation

b3b0b03

Naman2701B commented Oct 8, 2025

View reviewed changes

Naman2701B added 2 commits October 11, 2025 15:42

Removed PlanningLLMProtocol and LLMProviderFactory redundant classes,…

44e5de7

… do not make sense - suggested by Evan

Added tool check for the system after the PlanningLLM gets called to …

806ea06

…see the LLM does not hallucinate

Naman2701B changed the title ~~Feat: Execution Implementation~~ WIP: Feat: Execution Implementation Oct 13, 2025

Naman2701B added 2 commits October 19, 2025 16:13

Added Documentation, updated prompt for execution LLM, maintained che…

91cd8b4

…cks for tool hallucination by the PlanningLLM

Updated logging using logger and removed unnecessary blank space

5b877ec

Naman2701B changed the title ~~WIP: Feat: Execution Implementation~~ Feat: Execution Implementation Nov 3, 2025

annmalavet reviewed Nov 3, 2025

View reviewed changes

JustinCappos reviewed Nov 4, 2025

View reviewed changes

Naman2701B force-pushed the executor branch from 6df4e85 to 5b877ec Compare November 24, 2025 20:52

aadityare mentioned this pull request Dec 4, 2025

Create Testing Framework #17

Open

enjhnsn2 requested changes Dec 9, 2025

View reviewed changes

Updated and added exec command support

b255d55

Naman2701B requested a review from enjhnsn2 December 10, 2025 01:33

Feat: Execution Implementation #10

Are you sure you want to change the base?

Feat: Execution Implementation #10

Uh oh!

Conversation

Naman2701B commented Oct 6, 2025

Uh oh!

JustinCappos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Naman2701B left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

annmalavet Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Naman2701B Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Naman2701B commented Nov 24, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Naman2701B left a comment •

edited

Loading

annmalavet Nov 3, 2025 •

edited

Loading

Naman2701B Nov 10, 2025 •

edited

Loading