-
Notifications
You must be signed in to change notification settings - Fork 5
Feat: Execution Implementation #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
JustinCappos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still pretty confused by the code comments and overall design in places. Can you try to clarify things more?
| """ | ||
| Normalize SubPrompt into a real dict. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the purpose of this is. Can you explain what you're doing and why here?
| """ | ||
| Extracting arguments from the prompt for both cases: | ||
| 1. Getting both key-value pairs for the system as a whole | ||
| 2. Getting only the key for the parameter to be obfuscated | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also confused here. What is the purpose of this function?
src/shardguard/core/coordination.py
Outdated
| """ | ||
| Function to make calls to specific tools specified by the Planning LLM | ||
| Args: | ||
| LLMStepResponse: Processed prompts to breakdown specific tasks so that no other MCP knows about each other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused what this means. Is this one execution LLM or?
src/shardguard/core/execution.py
Outdated
| This file handles the breaking down of a task into single or | ||
| multiple MCP tools based on suggested tools in the sub prompt | ||
| given by the LLM. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a single execution LLM? Why would this file do the breakdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if a subprompt suggested tools - gives 2 tools as response generated by the LLM. This file would handle both the tools because we take the subprompt as a whole.
src/shardguard/core/execution.py
Outdated
| @dataclass | ||
| class LLMStepResponse: | ||
| """ | ||
| This contains the list of ToolCall, as multiple calls might be needed to make a specific sub prompt run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a ToolCall?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have referenced this above from the same file, hence did not elaborate on it again.
src/shardguard/core/execution.py
Outdated
| def __init__(self) -> None: | ||
| pass | ||
|
|
||
| async def run_step(self, step: Any) -> LLMStepResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "step"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Updated the execution LLM added Gemini Provider similar to that of Planning LLM.
- Updated the prompts for PlanningLLM to not include tools, as it makes us lose essence of the ExecutionLLM.
- Added validation schemas for both the Planning LLM and Tool Execution.
… do not make sense - suggested by Evan
…see the LLM does not hallucinate
…cks for tool hallucination by the PlanningLLM
src/shardguard/core/execution.py
Outdated
| """ | ||
| This contains only the server and tool required which would be returned, | ||
| args would not be part of this as the execution LLM is assumed to not be | ||
| trusted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this some more? My understanding of the MCP API is that the MCP responds to a tools/list request and provides all the tools
The MCP server must list all tools for capabilities
https://modelcontextprotocol.io/specification/2025-06-18/server/tools
Is the tool list coming from using the MCP API to request the tools from the MCP Server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is part of the ExecutionLLM, where the tools have already been decided by the planning LLM. This file just handles executing those specific tools that have been suggested by PlanningLLM. Hence, the MCP tool call API is not relevant for this section. The tool call APIs are being handled in the planning section of ShardGuard.
| if (len(suggested_tools)!=0): | ||
| for tool in suggested_tools: | ||
| if tool in tools: | ||
| return True | ||
| else: | ||
| return False | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you trying to do here? I'm almost certain this isn't what you mean to check. There is no need for that for loop because you exit after looking at suggested_tools[0]. But I think you want to make sure each tool in suggested_tools is in the tools list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, sorry hadn't encountered any scenario with multiple tool calls hence did not realize the case was faulty, it has been taken care of in the upcoming commit.
| # Instantiating a new ExecutionLLM for each task so that none of them have each others context | ||
| exec_llm = make_execution_llm(provider, detected_model, api_key=api_key) | ||
| executor = StepExecutor(exec_llm) | ||
| task = self._to_dict(task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this doing? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of our block diagram, we have a new execution LLM at each step for each subtask, this instantiates a new ExecutionLLM each time a subtask is to be executed.
And the last line _to_dict is for normalizing the data structure for the system readability.
|
UPDATED THE PR, this PR contains only -
PS: I had to force push to revert changes so that I can go back to previous commit which just included the implementation section of the code. |
| 3. **Decompose** the redacted prompt into clear, numbered subtasks. | ||
| 4. **Consider available MCP tools** when breaking down tasks - if a task can be accomplished using | ||
| an available tool, mention the relevant tool in the subtask description. | ||
| an available tool, mention the relevant tool in the **suggested_tools** section of the subprompt as provided in the output schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fine way to get structured outputs, but we could also use model-side constrained decoding to get even better results. Many APIs support this (ollama and openai for sure). Example: https://openai.com/index/introducing-structured-outputs-in-the-api/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view, this is better as here - we have control and can use our own schemas to validate the outputs - without us really being dependent on models to make sure it is correct and will not hinder further processing of the subprompts.
src/shardguard/core/coordination.py
Outdated
| def _to_dict(self, obj: Any) -> Dict[str, Any]: | ||
| """ | ||
| Normalize SubPrompt into a real dict. | ||
| When the prompt goes to the LLM, it returns a Pydantic model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a Pydantic model? Presumably this is a python-specific thing, because I've worked with all the major llm APIs and never heard of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is python specific. You can understand this as something similar to Struct in C, where Pydantic internally manages the type of the parameters included inside the specific model, you can look at this file, which creates this model framework.
src/shardguard/core/coordination.py
Outdated
| if(self.retryCount<=5): | ||
| self.retryCount+=1 | ||
| logger.warning(f"Retrying Planning LLM due to invalid tool suggestion!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the error handling because of things like network errors, or something more fundamental? Also, can you pull out the 5 here into a constant MAX_RETRIES or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make it to global variable, yes.
Reason that I have this max retries implemented:
- If the Planning LLM does not perform fine and ends up hallucinating tool, we do not want it to be in an indefinite loop of continuous retries.
And for now, I had set it specifically to 5, because free version of Gemini has a rate limiter and I was not able to get the model to perform while testing.
| # Validating the result from the tool call with the expected schema | ||
| _validate_output(result, output_schema, where="Tool Call") | ||
|
|
||
| logger.warning(f"{call.server}: {call.tool} was called with the parameters: {per_tool_args}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a logger.info or logger.debug, not a warning, as nothing is wrong about making a tool call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explicitly kept it as logger.warning because for some reason, if I keep it as logger.info - I do not get any output - and there is no way for a user to understand that this actually worked or not.
In my view, we should have this till we are using stub MCP servers and tools. Once, we are working with actual real world servers - we can keep it as logging.info or get rid of it altogether as then the we will find changes in real life working application.
| RULES (hard constraints): | ||
| - Use **ONLY tools** listed in "suggested_tools". Zero exceptions. | ||
| - **Do NOT** invent tools, servers, steps, or intermediate IDs. | ||
| - All outputs must be pure JSON. No prose. No code fences. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also have a "hard" constraint on tools that the execution llm can use? For example, if it violated this rule, what would happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not want the execution LLM to be hallucinating tools at any instance, hence we need to have a hard check on that.
Additionally, lets say it still ends up hallucinating, and gives a server-tool pair name which our coordination service is not aware of, it will end up throwing error as this tool for our MCP context does not exist.
Implemented the execution logic where the args for the server are obfuscated till the time the specific server does not require the value for running.