Skip to content

Agent skills for Databricks Assistant#1025

Open
grusin-db wants to merge 10 commits intomainfrom
f/agent_skills
Open

Agent skills for Databricks Assistant#1025
grusin-db wants to merge 10 commits intomainfrom
f/agent_skills

Conversation

@grusin-db
Copy link
Collaborator

@grusin-db grusin-db commented Feb 6, 2026

PR: Add install_skills() -- deploy DQX agent skills to Databricks Assistant

Branch: f/agent_skillsmain | 3 commits | +2109 lines


Key Changes

1. New install_skills() command (src/databricks/labs/dqx/skills.py)

One-liner to deploy DQX agent skills into a Databricks workspace so that Databricks Assistant can use them.

from databricks.labs import dqx

dqx.install_skills()                        # → /Users/{me}/.assistant/skills/dqx/
dqx.install_skills("/Shared/skills/dqx")    # custom folder

Reads bundled SKILL.md files and example scripts from package resources via importlib.resources, then uploads them with WorkspaceClient.workspace.upload().

2. Bundled skill content (src/databricks/labs/dqx/resources/skills/)

Six SKILL.md files covering the full DQX surface, structured for the assistant's skill discovery:

Skill Scope
dqx/SKILL.md Root: install, quick-start, routing to sub-skills
checks/SKILL.md Check authoring, apply, split, reference DataFrames
checks/row-level/SKILL.md 30+ row-level functions with full parameter reference (required vs optional)
checks/dataset-level/SKILL.md 10+ dataset-level functions with full parameter reference
checks/custom/SKILL.md SQL expressions, Python row/dataset checks
profiling/SKILL.md DQProfiler + manual SQL profiling

Each SKILL.md includes a parameter table at the top that explicitly marks required vs optional arguments to prevent the assistant from generating invalid check definitions (e.g. calling is_in_range without both min_limit and max_limit).

3. Runnable examples (checks/examples/)

21 self-contained Python scripts (one per check category) bundled as package resources and deployed alongside skills:

  • 0109: row-level checks (null, list, comparison, range, regex, datetime, network, SQL, complex types)
  • 1016: dataset-level checks (unique, aggregation, FK, compare, freshness, schema, SQL query)
  • 1719: custom checks (SQL window, Python row, Python dataset)
  • 2021: profiling

All SKILL.md files cross-reference relevant examples.

4. Package wiring (__init__.py, resources/__init__.py)

install_skills is exported from databricks.labs.dqx.__init__, so both import styles work:

from databricks.labs.dqx import install_skills
# or
from databricks.labs import dqx; dqx.install_skills()

No pyproject.toml build changes needed -- hatchling already includes non-Python files under src/.


⚠️ Breaking Changes

None. This is purely additive.

Examples

Databricks AI Assistant with skills loaded

image

...

image

Unskilled Databricks AI Assistant

image

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

✅ 547/547 passed, 61 skipped, 4h59m2s total

Running from acceptance #3892

@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.81%. Comparing base (e5d2701) to head (b1f6ca9).

Files with missing lines Patch % Lines
src/databricks/labs/dqx/skills.py 0.00% 26 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1025      +/-   ##
==========================================
- Coverage   91.15%   88.81%   -2.34%     
==========================================
  Files          64       65       +1     
  Lines        6672     6698      +26     
==========================================
- Hits         6082     5949     -133     
- Misses        590      749     +159     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new install_skills() helper to deploy bundled DQX “skills” (SKILL.md + runnable examples) into a Databricks workspace folder for Databricks Assistant discovery.

Changes:

  • Introduces src/databricks/labs/dqx/skills.py with install_skills() that uploads package-bundled skill content to the workspace.
  • Bundles a skills tree under src/databricks/labs/dqx/resources/skills/ including SKILL.md docs and runnable example scripts.
  • Updates exports/docs/perf artifacts to reflect new skills/examples and benchmark additions.

Reviewed changes

Copilot reviewed 32 out of 33 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/perf/.benchmarks/baseline.json Updates stored benchmark baselines to include new benchmark cases.
src/databricks/labs/dqx/skills.py Adds install_skills() implementation that discovers packaged resources and uploads them to Workspace.
src/databricks/labs/dqx/resources/skills/dqx/SKILL.md Root skill entrypoint for DQX assistant discovery and routing to sub-skills.
src/databricks/labs/dqx/resources/skills/dqx/checks/SKILL.md Documents authoring/applying checks and references examples.
src/databricks/labs/dqx/resources/skills/dqx/checks/row-level/SKILL.md Adds row-level checks reference + examples index.
src/databricks/labs/dqx/resources/skills/dqx/checks/dataset-level/SKILL.md Adds dataset-level checks reference + examples index.
src/databricks/labs/dqx/resources/skills/dqx/checks/custom/SKILL.md Adds custom checks guidance + examples index.
src/databricks/labs/dqx/resources/skills/dqx/profiling/SKILL.md Adds profiling skill content and runnable example references.
src/databricks/labs/dqx/resources/skills/dqx/checks/examples/*.py Adds runnable example scripts to be deployed alongside skills.
src/databricks/labs/dqx/init.py Exports install_skills from the package root.
docs/dqx/docs/reference/benchmarks.mdx Updates benchmark documentation tables with new entries.
demo.ipynb Adds an executable demo notebook showing dqx.install_skills() usage.
Comments suppressed due to low confidence (6)

src/databricks/labs/dqx/skills.py:1

  • Converting importlib.resources.files(...) to a filesystem Path via Path(str(...)) will break when the package is installed as a zip/pex/egg (resources aren’t guaranteed to exist on disk), and rglob/read_bytes won’t work reliably. Keep the object as an importlib.resources.abc.Traversable and either (a) recursively walk it via iterdir() and use Traversable.read_bytes(), or (b) use importlib.resources.as_file() around the directory Traversable to obtain a temporary real path before doing rglob().
    src/databricks/labs/dqx/skills.py:1
  • Converting importlib.resources.files(...) to a filesystem Path via Path(str(...)) will break when the package is installed as a zip/pex/egg (resources aren’t guaranteed to exist on disk), and rglob/read_bytes won’t work reliably. Keep the object as an importlib.resources.abc.Traversable and either (a) recursively walk it via iterdir() and use Traversable.read_bytes(), or (b) use importlib.resources.as_file() around the directory Traversable to obtain a temporary real path before doing rglob().
    src/databricks/labs/dqx/skills.py:1
  • ws.workspace.mkdirs(target_dir) is invoked for every file, which can add a lot of redundant API calls for larger skill trees. Track which directories have already been created (e.g., a set[str]) and only call mkdirs the first time each directory is encountered.
    src/databricks/labs/dqx/skills.py:1
  • ws.workspace.mkdirs(target_dir) is invoked for every file, which can add a lot of redundant API calls for larger skill trees. Track which directories have already been created (e.g., a set[str]) and only call mkdirs the first time each directory is encountered.
    src/databricks/labs/dqx/skills.py:1
  • install_skills() adds new behavior that is sensitive to environment (current user lookup, resource discovery, workspace upload). Please add unit tests that mock WorkspaceClient to verify: (1) default folder resolution uses the current user, (2) the correct target paths are computed for nested resources, and (3) uploads are invoked with overwrite enabled.
    src/databricks/labs/dqx/skills.py:1
  • install_skills() adds new behavior that is sensitive to environment (current user lookup, resource discovery, workspace upload). Please add unit tests that mock WorkspaceClient to verify: (1) default folder resolution uses the current user, (2) the correct target paths are computed for nested resources, and (3) uploads are invoked with overwrite enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 42 to 43

__all__ = ["install_skills"]
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining __all__ as ["install_skills"] changes from databricks.labs.dqx import * behavior to only export install_skills, which can be an unintended breaking change for users relying on star-imports. Consider removing __all__ entirely, or appending to an existing/exported list (e.g., include previously intended public names) rather than restricting exports to only this symbol.

Suggested change
__all__ = ["install_skills"]

Copilot uses AI. Check for mistakes.
demo.ipynb Outdated
Comment on lines 18 to 30
"execution_count": 2,
"id": "c234d4f6",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/grzegorz.rusin/repos/dqx/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The committed notebook output includes a local filesystem path and a user email address, which are sensitive and typically should not be checked in. Clear notebook outputs before committing (and consider removing machine-specific setup like sys.path.insert(...) or replacing it with a more portable approach) so the repo doesn’t leak developer environment details.

Suggested change
"execution_count": 2,
"id": "c234d4f6",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/grzegorz.rusin/repos/dqx/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"execution_count": null,
"id": "c234d4f6",
"metadata": {},
"outputs": [],

Copilot uses AI. Check for mistakes.
demo.ipynb Outdated
"output_type": "stream",
"text": [
"\u001b[90m12:38:26\u001b[0m \u001b[1m\u001b[32m INFO\u001b[0m \u001b[1m[databricks.sdk] Using Databricks Metadata Service authentication\u001b[0m\n",
"\u001b[90m12:38:27\u001b[0m \u001b[1m\u001b[32m INFO\u001b[0m \u001b[1m[d.l.dqx.skills] Deploying skills to: /Users/grzegorz.rusin@databricks.com/.assistant/skills/dqx\u001b[0m\n",
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The committed notebook output includes a local filesystem path and a user email address, which are sensitive and typically should not be checked in. Clear notebook outputs before committing (and consider removing machine-specific setup like sys.path.insert(...) or replacing it with a more portable approach) so the repo doesn’t leak developer environment details.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool!

Feedback:

  • Please sign the commits.
  • We need to reduce the maintenance effort. All the examples should already be available in the docs. We need a better way to manage this to avoid duplication. E.g. store examples in one place and source it for both docs and skills from there; document the examples in each check function. The examples on how to use DQX are also already in the docs. We need to centralize this.

@mwojtyczka mwojtyczka changed the title F/agent skills Agent skills for Databricks Assistant Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants