Mcp auto expand datasets #25

BurhanAbdullah · 2026-01-04T11:24:46Z

Summary

Auto-expand dataset_ids in the MCP retrieval tool when the field is
omitted or an empty list is provided.

Solution Description

When dataset_ids is not supplied by the client, the MCP server now
automatically retrieves all available datasets and uses their IDs
for the retrieval request.

This aligns the runtime behaviour with the tool description, improves
developer experience, and removes the need for clients to manually
fetch dataset IDs before querying.

…atasets"

CLAassistant · 2026-01-04T11:24:52Z

All committers have signed the CLA.

Copilot

Pull request overview

This PR enhances the MCP retrieval tool by auto-expanding dataset_ids when omitted or empty, adds a new dataset summary tool, and removes duplicate code while improving formatting consistency.

Key Changes:

Automatic expansion of dataset_ids to include all available datasets when the field is not provided or empty
Introduction of ragflow_dataset_summary tool for quick dataset metadata retrieval
Removal of duplicate dataset auto-expansion logic from the retrieval method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-05T02:36:06Z

mcp/server/server.py

    if name == "ragflow_retrieval":
        document_ids = arguments.get("document_ids", [])
-        dataset_ids = arguments.get("dataset_ids", [])
+        dataset_ids = arguments.get("dataset_ids") or []


The expression arguments.get("dataset_ids") or [] will treat an empty list as falsy and replace it with a new empty list. This is redundant but harmless. However, for consistency with document_ids on line 445 which uses arguments.get("document_ids", []), consider using the same pattern: arguments.get("dataset_ids", []).

Suggested change

dataset_ids = arguments.get("dataset_ids") or []

dataset_ids = arguments.get("dataset_ids", [])

Copilot · 2026-01-05T02:36:06Z

mcp/server/server.py

-        dataset_ids = arguments.get("dataset_ids", [])
+        dataset_ids = arguments.get("dataset_ids") or []
+
+


There are two consecutive blank lines here. Consider removing one to maintain consistent spacing throughout the file, as single blank lines are used elsewhere (e.g., between the comment and the if statement on lines 459-460).

Suggested change

Copilot · 2026-01-05T02:36:06Z

mcp/server/server.py

+                    try:
+                        dataset_ids.append(json.loads(line)["id"])
+                    except Exception:
+                        continue


The broad exception handler except Exception: silently ignores all parsing errors, including potentially serious issues like network errors or unexpected JSON structure. Consider at least logging these exceptions for debugging purposes, or being more specific about which exceptions to catch (e.g., json.JSONDecodeError, KeyError).

Copilot · 2026-01-05T02:36:07Z

mcp/server/server.py

+                try:
+                    datasets.append(json.loads(line))
+                except Exception:
+                    continue


The broad exception handler except Exception: silently ignores all parsing errors. Similar to the issue in the ragflow_retrieval handler, consider logging these exceptions or being more specific about which exceptions to catch (e.g., json.JSONDecodeError).

Copilot · 2026-01-05T02:36:07Z

mcp/server/server.py

                    "dataset_ids": {
                        "type": "array",
                        "items": {"type": "string"},
-                        "description": "Optional array of dataset IDs to search. If not provided or empty, all datasets will be searched."
                    },
                    "document_ids": {
                        "type": "array",
                        "items": {"type": "string"},
-                        "description": "Optional array of document IDs to search within."
                    },
                    "question": {
                        "type": "string",
-                        "description": "The question or query to search for."
                    },
                    "page": {
                        "type": "integer",
-                        "description": "Page number for pagination",
                        "default": 1,
                        "minimum": 1,
                    },
                    "page_size": {
                        "type": "integer",
-                        "description": "Number of results to return per page (default: 10, max recommended: 50 to avoid token limits)",
                        "default": 10,
                        "minimum": 1,
                        "maximum": 100,
                    },
                    "similarity_threshold": {
                        "type": "number",
-                        "description": "Minimum similarity threshold for results",
                        "default": 0.2,
                        "minimum": 0.0,
                        "maximum": 1.0,
                    },
                    "vector_similarity_weight": {
                        "type": "number",
-                        "description": "Weight for vector similarity vs term similarity",
                        "default": 0.3,
                        "minimum": 0.0,
                        "maximum": 1.0,
                    },
                    "keyword": {
                        "type": "boolean",
-                        "description": "Enable keyword-based search",
                        "default": False,
                    },
                    "top_k": {
                        "type": "integer",
-                        "description": "Maximum results to consider before ranking",
                        "default": 1024,
                        "minimum": 1,
                        "maximum": 1024,
                    },
                    "rerank_id": {
                        "type": "string",
-                        "description": "Optional reranking model identifier",
                    },
                    "force_refresh": {
                        "type": "boolean",
-                        "description": "Set to true only if fresh dataset and document metadata is explicitly required. Otherwise, cached metadata is used (default: false).",
                        "default": False,
                    },


The inputSchema properties are missing their "description" fields. While the properties have appropriate defaults and constraints, adding descriptions would improve the developer experience by providing inline documentation about what each parameter does. This is especially important for parameters like "similarity_threshold", "vector_similarity_weight", and "top_k" that require domain knowledge to use effectively.

Copilot · 2026-01-05T02:36:07Z

mcp/server/server.py

            dataset_ids = []
-
-            # Parse the dataset list to extract IDs
+


Empty line 462 immediately followed by another empty line on 463 creates inconsistent spacing. Consider removing one of these blank lines to maintain uniform code style.

Suggested change

BurhanAbdullah added 2 commits January 4, 2026 15:55

git commit -m "mcp: auto-expand datasets when dataset_ids not provided"

71fe00b

git commit -m "mcp: add dataset summary tool and safely auto-expand d…

da4486b

…atasets"

whhe requested a review from Copilot January 5, 2026 02:33

Copilot started reviewing on behalf of whhe January 5, 2026 02:33 View session

Copilot AI reviewed Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mcp auto expand datasets #25

Mcp auto expand datasets #25

Uh oh!

BurhanAbdullah commented Jan 4, 2026

Uh oh!

CLAassistant commented Jan 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	dataset_ids = arguments.get("dataset_ids") or []
	dataset_ids = arguments.get("dataset_ids", [])

Mcp auto expand datasets #25

Are you sure you want to change the base?

Mcp auto expand datasets #25

Uh oh!

Conversation

BurhanAbdullah commented Jan 4, 2026

Summary

Solution Description

Uh oh!

CLAassistant commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Jan 4, 2026 •

edited

Loading