Skip to content

Conversation

@Naman2701B
Copy link
Collaborator

Implemented the execution logic where the args for the server are obfuscated till the time the specific server does not require the value for running.

Copy link
Owner

@JustinCappos JustinCappos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still pretty confused by the code comments and overall design in places. Can you try to clarify things more?

Comment on lines 28 to 30
"""
Normalize SubPrompt into a real dict.
"""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the purpose of this is. Can you explain what you're doing and why here?

Comment on lines 56 to 60
"""
Extracting arguments from the prompt for both cases:
1. Getting both key-value pairs for the system as a whole
2. Getting only the key for the parameter to be obfuscated
"""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also confused here. What is the purpose of this function?

"""
Function to make calls to specific tools specified by the Planning LLM
Args:
LLMStepResponse: Processed prompts to breakdown specific tasks so that no other MCP knows about each other
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused what this means. Is this one execution LLM or?

Comment on lines 2 to 5
This file handles the breaking down of a task into single or
multiple MCP tools based on suggested tools in the sub prompt
given by the LLM.
"""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a single execution LLM? Why would this file do the breakdown?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if a subprompt suggested tools - gives 2 tools as response generated by the LLM. This file would handle both the tools because we take the subprompt as a whole.

@dataclass
class LLMStepResponse:
"""
This contains the list of ToolCall, as multiple calls might be needed to make a specific sub prompt run.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a ToolCall?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have referenced this above from the same file, hence did not elaborate on it again.

def __init__(self) -> None:
pass

async def run_step(self, step: Any) -> LLMStepResponse:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "step"?

Copy link
Collaborator Author

@Naman2701B Naman2701B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Updated the execution LLM added Gemini Provider similar to that of Planning LLM.
  • Updated the prompts for PlanningLLM to not include tools, as it makes us lose essence of the ExecutionLLM.
  • Added validation schemas for both the Planning LLM and Tool Execution.

@Naman2701B Naman2701B changed the title Feat: Execution Implementation WIP: Feat: Execution Implementation Oct 13, 2025
@Naman2701B Naman2701B changed the title WIP: Feat: Execution Implementation Feat: Execution Implementation Nov 3, 2025
"""
This contains only the server and tool required which would be returned,
args would not be part of this as the execution LLM is assumed to not be
trusted
Copy link
Collaborator

@annmalavet annmalavet Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this some more? My understanding of the MCP API is that the MCP responds to a tools/list request and provides all the tools

The MCP server must list all tools for capabilities

https://modelcontextprotocol.io/specification/2025-06-18/server/tools

Is the tool list coming from using the MCP API to request the tools from the MCP Server?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is part of the ExecutionLLM, where the tools have already been decided by the planning LLM. This file just handles executing those specific tools that have been suggested by PlanningLLM. Hence, the MCP tool call API is not relevant for this section. The tool call APIs are being handled in the planning section of ShardGuard.

Comment on lines +64 to +70
if (len(suggested_tools)!=0):
for tool in suggested_tools:
if tool in tools:
return True
else:
return False
return True
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you trying to do here? I'm almost certain this isn't what you mean to check. There is no need for that for loop because you exit after looking at suggested_tools[0]. But I think you want to make sure each tool in suggested_tools is in the tools list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, sorry hadn't encountered any scenario with multiple tool calls hence did not realize the case was faulty, it has been taken care of in the upcoming commit.

# Instantiating a new ExecutionLLM for each task so that none of them have each others context
exec_llm = make_execution_llm(provider, detected_model, api_key=api_key)
executor = StepExecutor(exec_llm)
task = self._to_dict(task)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this doing? Why?

Copy link
Collaborator Author

@Naman2701B Naman2701B Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of our block diagram, we have a new execution LLM at each step for each subtask, this instantiates a new ExecutionLLM each time a subtask is to be executed.
And the last line _to_dict is for normalizing the data structure for the system readability.

@Naman2701B
Copy link
Collaborator Author

UPDATED THE PR, this PR contains only -

  1. Execution implementation
  2. Handling coordination service to have execution system being called.

PS: I had to force push to revert changes so that I can go back to previous commit which just included the implementation section of the code.

3. **Decompose** the redacted prompt into clear, numbered subtasks.
4. **Consider available MCP tools** when breaking down tasks - if a task can be accomplished using
an available tool, mention the relevant tool in the subtask description.
an available tool, mention the relevant tool in the **suggested_tools** section of the subprompt as provided in the output schema.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fine way to get structured outputs, but we could also use model-side constrained decoding to get even better results. Many APIs support this (ollama and openai for sure). Example: https://openai.com/index/introducing-structured-outputs-in-the-api/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, this is better as here - we have control and can use our own schemas to validate the outputs - without us really being dependent on models to make sure it is correct and will not hinder further processing of the subprompts.

def _to_dict(self, obj: Any) -> Dict[str, Any]:
"""
Normalize SubPrompt into a real dict.
When the prompt goes to the LLM, it returns a Pydantic model,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a Pydantic model? Presumably this is a python-specific thing, because I've worked with all the major llm APIs and never heard of this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is python specific. You can understand this as something similar to Struct in C, where Pydantic internally manages the type of the parameters included inside the specific model, you can look at this file, which creates this model framework.

Comment on lines 88 to 90
if(self.retryCount<=5):
self.retryCount+=1
logger.warning(f"Retrying Planning LLM due to invalid tool suggestion!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the error handling because of things like network errors, or something more fundamental? Also, can you pull out the 5 here into a constant MAX_RETRIES or something similar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make it to global variable, yes.
Reason that I have this max retries implemented:

  1. If the Planning LLM does not perform fine and ends up hallucinating tool, we do not want it to be in an indefinite loop of continuous retries.
    And for now, I had set it specifically to 5, because free version of Gemini has a rate limiter and I was not able to get the model to perform while testing.

# Validating the result from the tool call with the expected schema
_validate_output(result, output_schema, where="Tool Call")

logger.warning(f"{call.server}: {call.tool} was called with the parameters: {per_tool_args}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a logger.info or logger.debug, not a warning, as nothing is wrong about making a tool call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly kept it as logger.warning because for some reason, if I keep it as logger.info - I do not get any output - and there is no way for a user to understand that this actually worked or not.
In my view, we should have this till we are using stub MCP servers and tools. Once, we are working with actual real world servers - we can keep it as logging.info or get rid of it altogether as then the we will find changes in real life working application.

Comment on lines +48 to +51
RULES (hard constraints):
- Use **ONLY tools** listed in "suggested_tools". Zero exceptions.
- **Do NOT** invent tools, servers, steps, or intermediate IDs.
- All outputs must be pure JSON. No prose. No code fences.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also have a "hard" constraint on tools that the execution llm can use? For example, if it violated this rule, what would happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not want the execution LLM to be hallucinating tools at any instance, hence we need to have a hard check on that.
Additionally, lets say it still ends up hallucinating, and gives a server-tool pair name which our coordination service is not aware of, it will end up throwing error as this tool for our MCP context does not exist.

@Naman2701B Naman2701B requested a review from enjhnsn2 December 10, 2025 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants