Skip to content

Conversation

@sebastien-rosset
Copy link
Contributor

@sebastien-rosset sebastien-rosset commented Mar 17, 2025

Description

  1. Fix Confusing search_files function description leads to misunderstanding of its purpose #896: "Search file by name" enhancements
    1. Rename the existing search_files function to search_files_by_name
    2. Improve performance for deeply nested directories by processing directories in parallel batches.
    3. Improve description to make it clear it's a file/directory name matcher, not a file content search.
    4. Improve error handling
  2. Add new search_file_contents to search file content.
    1. Plain text or regex pattern matching
    2. Case-sensitive/insensitive search
    3. Configurable result limits
    4. Context lines around matches
    5. File glob filtering (include/exclude)
  3. Better parameter documentation throughout
  4. Fix "search_files" tool of Filesystem example MCP server does not work as expected #735, File server tool not finding files. #1067: sometimes the LLM was confused what is the exact syntax for file name matching. Is it substring match, regex match, glob pattern or something else? It wasn't very clear from the name and description.

Server Details

  • Server: filesystem
  • Changes to: tools

Motivation and Context

Both Claude and human users may misinterpret the search_files function as a content search tool (like grep) rather than a file/directory name matcher. The suggested description makes the distinction clear from the beginning and explicitly notes that it doesn't search file contents.

How Has This Been Tested?

Tested with Claude Desktop with the following prompts, before and after the change.

  1. The "Before" column shows which function was selected by the LLM before this PR.
  2. The "After" column shows which function was selected by the LLM with this PR, for the exact same prompt.
Prompt Before After
Find all references to "XYZ" under
/Users/my_userid/my_project/
search_files search_file_content
Find files whose names contain "color" under
/Users/my_userid/my_project/
search_files search_files_by_name
Search files that contain "XYZ" under
/Users/my_userid/my_project using filesystem MCP.
search_files search_file_content
Search "XYZ" under
/Users/my_userid/my_project using filesystem MCP.
search_files search_file_content
Search "xyz" in file a/b/c/foo.cpp search_files search_file_content

For prompt 1, the LLM can easily infer it should be searching for file contents, not matching file names. Previously, the LLM mistakenly invoked the search_files function, which typically would return no matching results.

For prompts 3 and 4, there is intentionally some prompt ambiguity to test how the LLM decide whether to use search_files_by_name or search_file_content based on the function names and descriptions.

Breaking Changes

Will users need to update their MCP client configurations?

No, but I'm not sure I understand breaking changes. I'm unclear whether it's ok to rename a function, or if it's considered to be a breaking change.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Protocol Documentation
  • My changes follows MCP security best practices
  • I have updated the server's README accordingly
  • I have tested this with an LLM client
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have documented all environment variables and configuration options

Additional context

Tests for the search_file_contents function

  • Basic text search returns correct results for common terms
  • Case-sensitive search correctly differentiates between uppercase and lowercase patterns
  • Regex pattern search correctly identifies patterns like function declarations async function \w+\(
  • Context lines display works correctly with different values (0, 2, 4 lines)
  • Max results parameter properly limits the number of results returned
  • File extension filtering correctly includes only the specified file types (.ts, .js, etc.)
  • File extension filtering handles files without extensions correctly
  • Exclude patterns with wildcards (**/*.test.ts) correctly filter out matching files
  • Exclude patterns with simple strings (*config*) correctly filter content containing those strings
  • Combined search with regex + case sensitivity + file types works correctly
  • Search handles large files efficiently without excessive memory usage
  • Directory recursion works correctly for nested file structures
  • Error handling for inaccessible files or directories works as expected
  • Race conditions are avoided when searching directories in parallel
  • Files with non-UTF8 encoding are handled gracefully without crashing

@sebastien-rosset sebastien-rosset marked this pull request as draft March 17, 2025 21:08
@sebastien-rosset sebastien-rosset marked this pull request as ready for review March 17, 2025 21:37
@sebastien-rosset
Copy link
Contributor Author

sebastien-rosset commented Mar 17, 2025

Another typically test scenario is when refactoring code. I've noticed Claude typically tries to read a file if the user has provided the file name, as in the example below:

I'll help you rewrite the message so that the configuration parameter can be properly translated. Let me first examine the file to understand the context better.
View result from read_file from filesystem (local)
Result too long, truncated to 100000 characters

If the file is too large, then the LLM tries to search content within the file(s). Previously, it was invoking the search_file function, but that always returned empty results, because Claude was confused about the purpose of the search_file function. With this PR, the LLM correctly invokes search_file_contents:

Now let me look for any references to translating the parameter name:
View result from search_file_contents from filesystem (local)

This makes it possible to have better discovery of a code base, e.g.:

View result from list_allowed_directories from filesystem (local)
Let me look at the xxxx.cpp file to understand the current constructor implementation:
View result from read_file from filesystem (local)
Now I'll look at the xxxx.h file to understand the class definitions better:
View result from read_file from filesystem (local)
Let's look for all header files related to the xxxx plugin:
View result from search_files_by_name from filesystem (local)
Let's try another approach and see what files are in the src directory
View result from list_directory from filesystem (local)
Now let's look for the corresponding header files:
View result from list_directory from filesystem (local)
Great! Now let's look at the zzzz.h file to understand how the xxxx class is defined:
View result from read_file from filesystem (local)
Now let's examine the xxxx.cpp file to see how the constructor is implemented:
View result from read_file from filesystem (local)
Now that I have a good understanding of the codebase, I can modify xxxx ...

@sebastien-rosset
Copy link
Contributor Author

sebastien-rosset commented Mar 19, 2025

I've noticed sometimes Claude invokes search_files_by_name with the following input:

{
  `path`: `/Users/xxxx/myproject/src`,
  `pattern`: `.h`
}
[{"type": "text", "text": "No matches found"}]

I improved the description of the pattern argument to make it clear this is a substring match, not a glob and not a regex.

@olaservo olaservo added server-filesystem Reference implementation for the Filesystem MCP server - src/filesystem enhancement New feature or request labels Mar 27, 2025
@sebastien-rosset
Copy link
Contributor Author

sebastien-rosset commented Apr 7, 2025

I've noticed that Claude AI often tries to invoke the search functions with a regex, most likely because the name of the input argument is "pattern". I've added a useRegex argument to make it more explicit.

@FalacerSelene
Copy link

Thanks for this, I came here with the same issue that you've solved here. I was going to suggest find_files and grep_files as the names of the tools here, to match the names of the tools that they behave like, but if by_name and contents is enough to make it clear to Claude what to do then I'm happy with it.

@sebastien-rosset sebastien-rosset force-pushed the search_files branch 3 times, most recently from af8476e to 3259d90 Compare April 29, 2025 03:27
@sebastien-rosset
Copy link
Contributor Author

Thanks for this, I came here with the same issue that you've solved here. I was going to suggest find_files and grep_files as the names of the tools here, to match the names of the tools that they behave like, but if by_name and contents is enough to make it clear to Claude what to do then I'm happy with it.

I've used this for over a month and it generally works well. I don't mind changing to "find_files" and "grep_files". I'm not sure if it would change the behavior.
Based on my use of Claude for the past several weeks, I have renamed the pattern argument to searchText. I had noticed sometimes Claude was invoking the tool with a regex, probably because the word "pattern" is strongly associated with "regular expression".

@sebastien-rosset sebastien-rosset force-pushed the search_files branch 2 times, most recently from a940af0 to 12c56bd Compare May 6, 2025 21:28
@rakelkar
Copy link

I tried your change and my LLM was confused.. so I asked it to figure out what was wrong and suggest a change.. there is what it suggested:

The original instructions were a bit open-ended, which led to a minor confusion about parameter names. Here are a couple of improvements that could make it clearer:

  • Explicitly mention parameter names: The prompt could have stated that search_file_contents uses a parameter named searchText (not pattern), and search_files_by_name uses pattern. This hint would directly alert the user to use the correct keys without having to infer from code or error messages.
  • Provide a usage example: Including a short example of each function’s usage in the prompt would serve as a template. For instance, showing a JSON snippet for each case (similar to the examples above) would remove ambiguity.
  • Clarify the goal: If the goal was to teach the assistant how to use the tool properly, the prompt might explicitly say, "Find out the correct parameters by reading the tool’s source code or documentation, then demonstrate a correct usage." This way, it’s clear that simply guessing (as happened initially) might not be enough, and that examining the definitions is required.

By refining the prompt with these details, the assistant (or developer) would be less likely to misuse the functions and run into errors.

@sebastien-rosset sebastien-rosset force-pushed the search_files branch 2 times, most recently from 9488896 to 9f55bd4 Compare June 9, 2025 15:26
@sebastien-rosset sebastien-rosset force-pushed the search_files branch 4 times, most recently from 51e5b72 to a0e94e8 Compare June 18, 2025 21:20
@sebastien-rosset
Copy link
Contributor Author

@olaservo , is there anything that I need to do such that this PR can be considered for review?

@Symbolk
Copy link

Symbolk commented Jul 6, 2025

The official MCP Servers cannot guarantee itself even for such simple tools, so better unit test them or consider implementing these basic ones by ourselves.

@turabisaas
Copy link

I think this needs to be a tool in that server.

@davidpfarrell
Copy link

Not sure what's holding up this PR, but the tool failing to search for files by extension seems like it should be fixed ...

mcp

search_files:
Arguments: {"path":"/tmp/work","pattern":"*.png"}
Result: [{"type":"text","text":"No matches found"}]

check results

find /tmp/work -name '*.png' | wc -l
       4

@sebastien-rosset
Copy link
Contributor Author

Not sure what's holding up this PR, but the tool failing to search for files by extension seems like it should be fixed ...

According to the CONTRIBUTING.md, improvements to existing servers like this filesystem enhancement are welcome, but I don't have visibility into the review timelines and I don't know if a repo maintainer will look at this PR.

I'm happy to address any concerns or make adjustments. The fix has been working well in my testing environment for months, and I believe it would benefit the broader MCP community.

@sebastien-rosset sebastien-rosset force-pushed the search_files branch 4 times, most recently from becf720 to fb5ad1d Compare August 22, 2025 16:15
@sebastien-rosset
Copy link
Contributor Author

Resolved merge conflicts after #543 was merged.

@domdomegg
Copy link
Member

Awesome, thanks for the contribution! Sorry we've been very slow getting to this - we're a small team, most of whom are volunteer contributors.

The poor tool description definitely seems like a bug in the filesystem server, and it makes a lot of sense why models are confused by it. I think also the patterns was quite confusing for models previously.

After #623 and #745, I think this should be quite a bit better. Additionally, I've taken a stab at fixing the problem with a more minor description change in #2606 - somewhat inspired by the suggestions in the issue you raised, so thank you.

I think we're trying to keep these servers small and avoid adding net new functionality so I'm hesitant to accept a new tool search_file_contents right now (as you have probably realized from your experiences raising this PR, we're stretched for maintainer time so more things to maintain is scary!). The filesystem MCP (and other reference servers) are meant to be a developer reference for how you can build servers, rather than a primary server for people to use - I think maybe we need to do a better job communicating this. We'd be very happy to accept a PR to add links to different filesystem MCPs into the README that explore things like having a search_file_contents tool :)

However, I would be happy to approve and merge the parts around improving testing and nested folder traversal performance, if you'd be open to reworking this PR to target that / opening a new PR for that and tagging me? Possibly it'd be worth waiting until #2606 is merged, given I think that's a small one which can go in easily soon - and save you having to resolve yet another set of merge conflicts 😅

Thanks again for putting up with us here, and for thoughtfully raising the issue and PR! And sorry filesystem MCP has been in such a bad state this whole time.

@sebastien-rosset sebastien-rosset force-pushed the search_files branch 2 times, most recently from 4d210d2 to fc1841b Compare August 25, 2025 18:51
@sebastien-rosset
Copy link
Contributor Author

sebastien-rosset commented Aug 25, 2025

Awesome, thanks for the contribution! Sorry we've been very slow getting to this - we're a small team, most of whom are volunteer contributors.

The poor tool description definitely seems like a bug in the filesystem server, and it makes a lot of sense why models are confused by it. I think also the patterns was quite confusing for models previously.

After #623 and #745, I think this should be quite a bit better. Additionally, I've taken a stab at fixing the problem with a more minor description change in #2606 - somewhat inspired by the suggestions in the issue you raised, so thank you.

I think we're trying to keep these servers small and avoid adding net new functionality so I'm hesitant to accept a new tool search_file_contents right now (as you have probably realized from your experiences raising this PR, we're stretched for maintainer time so more things to maintain is scary!).

I've removed the search_file_contents action in this PR, and I've retained that new action in a separate branch.

The filesystem MCP (and other reference servers) are meant to be a developer reference for how you can build servers, rather than a primary server for people to use -

Hmm, the filesystem MCP server is used in production by many applications, including Claude desktop, VS code, Cursor and many others. The "filesystem" tool is not "just a reference implementation".

I think maybe we need to do a better job communicating this. We'd be very happy to accept a PR to add links to different filesystem MCPs into the README that explore things like having a search_file_contents tool :)

Interesting, I would have thought it makes sense to extend the existing MCP tool rather than create a new one. This is what I had in mind:

  1. Applications that already integrate the filesystem MCP server for file operations would immediately benefit from the search content capability, without having to install yet another MCP tool.
  2. File content search is fundamentally a filesystem operation - the LLM is querying the filesystem about what's inside files, just as it can query about file names and metadata.
  3. Content search often needs to be combined with file metadata (size, modification time, permissions) and path operations. Having everything in one server eliminates round-trips and enables more efficient combined operations.
  4. Mental model: developers think of content search as a filesystem operation (like grep or IDE search). Splitting it into a separate tool creates cognitive overhead - users would need to remember which server handles which file-related operations.

However, I would be happy to approve and merge the parts around improving testing and nested folder traversal performance, if you'd be open to reworking this PR to target that / opening a new PR for that and tagging me? Possibly it'd be worth waiting until #2606 is merged, given I think that's a small one which can go in easily soon - and save you having to resolve yet another set of merge conflicts 😅

Working on it by first resolving merge conflicts and removing the search_file_contents action.

Thanks again for putting up with us here, and for thoughtfully raising the issue and PR! And sorry filesystem MCP has been in such a bad state this whole time.

}

// Legacy function for backward compatibility
export async function searchFilesWithValidation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere? I think we can delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This is a byproduct of repeatedly resolving merge conflicts.

I'm removing searchFilesWithValidation entirely and using searchFilesByName, which is less ambiguous. I.e. searchFilesWithValidation could mean search file/directory by name, OR it could mean file content.

This is in 56a4962 for now. I will squash everything when all feedback has been addressed.

const caseSensitive = /[A-Z]/.test(pattern); // Check if pattern has uppercase characters

// Determine if the pattern is a glob pattern or a simple substring
const isGlobPattern = pattern.includes('*') || pattern.includes('?') || pattern.includes('[') || pattern.includes('{');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially breaking change (in the same way as discussed here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're going to make breaking changes, we should probably just simplify to only doing glob matching

- Add descriptions to help the LLM understand the fields.
- Support single file search.
- Support glob pattern for search by file name.
- Rename 'pattern' argument to 'searchText'. Sometimes the name 'pattern' caused
  Claude to pass a regex even though it was processed as a simple substring match.
- Add unit tests.
@sebastien-rosset sebastien-rosset changed the title Rename "search_files" to "search_files_by_name". Add "search_file_contents" Rename "search_files" to "search_files_by_name" Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request server-filesystem Reference implementation for the Filesystem MCP server - src/filesystem

Projects

None yet

8 participants