Agent skills for Databricks Assistant by grusin-db · Pull Request #1025 · databrickslabs/dqx

grusin-db · 2026-02-06T13:31:03Z

PR: Add `install_skills()` -- deploy DQX agent skills to Databricks Assistant

Branch: f/agent_skills → main | 3 commits | +2109 lines

Key Changes

1. New `install_skills()` command (`src/databricks/labs/dqx/skills.py`)

One-liner to deploy DQX agent skills into a Databricks workspace so that Databricks Assistant can use them.

from databricks.labs import dqx

dqx.install_skills()                        # → /Users/{me}/.assistant/skills/dqx/
dqx.install_skills("/Shared/skills/dqx")    # custom folder

Reads bundled SKILL.md files and example scripts from package resources via importlib.resources, then uploads them with WorkspaceClient.workspace.upload().

2. Bundled skill content (`src/databricks/labs/dqx/resources/skills/`)

Six SKILL.md files covering the full DQX surface, structured for the assistant's skill discovery:

Skill	Scope
`dqx/SKILL.md`	Root: install, quick-start, routing to sub-skills
`checks/SKILL.md`	Check authoring, apply, split, reference DataFrames
`checks/row-level/SKILL.md`	30+ row-level functions with full parameter reference (required vs optional)
`checks/dataset-level/SKILL.md`	10+ dataset-level functions with full parameter reference
`checks/custom/SKILL.md`	SQL expressions, Python row/dataset checks
`profiling/SKILL.md`	DQProfiler + manual SQL profiling

Each SKILL.md includes a parameter table at the top that explicitly marks required vs optional arguments to prevent the assistant from generating invalid check definitions (e.g. calling is_in_range without both min_limit and max_limit).

3. Runnable examples (`checks/examples/`)

21 self-contained Python scripts (one per check category) bundled as package resources and deployed alongside skills:

01–09: row-level checks (null, list, comparison, range, regex, datetime, network, SQL, complex types)
10–16: dataset-level checks (unique, aggregation, FK, compare, freshness, schema, SQL query)
17–19: custom checks (SQL window, Python row, Python dataset)
20–21: profiling

All SKILL.md files cross-reference relevant examples.

4. Package wiring (`init.py`, `resources/init.py`)

install_skills is exported from databricks.labs.dqx.__init__, so both import styles work:

from databricks.labs.dqx import install_skills
# or
from databricks.labs import dqx; dqx.install_skills()

No pyproject.toml build changes needed -- hatchling already includes non-Python files under src/.

⚠️ Breaking Changes

None. This is purely additive.

Examples

Databricks AI Assistant with skills loaded

...

Unskilled Databricks AI Assistant

github-actions · 2026-02-06T13:31:16Z

All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

github-actions · 2026-02-06T13:52:30Z

✅ 547/547 passed, 61 skipped, 4h59m2s total

_{Running from acceptance #3892}

codecov · 2026-02-06T14:35:04Z

Codecov Report

❌ Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.81%. Comparing base (e5d2701) to head (b1f6ca9).

Files with missing lines	Patch %	Lines
src/databricks/labs/dqx/skills.py	0.00%	26 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1025      +/-   ##
==========================================
- Coverage   91.15%   88.81%   -2.34%     
==========================================
  Files          64       65       +1     
  Lines        6672     6698      +26     
==========================================
- Hits         6082     5949     -133     
- Misses        590      749     +159

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a new install_skills() helper to deploy bundled DQX “skills” (SKILL.md + runnable examples) into a Databricks workspace folder for Databricks Assistant discovery.

Changes:

Introduces src/databricks/labs/dqx/skills.py with install_skills() that uploads package-bundled skill content to the workspace.
Bundles a skills tree under src/databricks/labs/dqx/resources/skills/ including SKILL.md docs and runnable example scripts.
Updates exports/docs/perf artifacts to reflect new skills/examples and benchmark additions.

Reviewed changes

Copilot reviewed 32 out of 33 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/perf/.benchmarks/baseline.json	Updates stored benchmark baselines to include new benchmark cases.
src/databricks/labs/dqx/skills.py	Adds `install_skills()` implementation that discovers packaged resources and uploads them to Workspace.
src/databricks/labs/dqx/resources/skills/dqx/SKILL.md	Root skill entrypoint for DQX assistant discovery and routing to sub-skills.
src/databricks/labs/dqx/resources/skills/dqx/checks/SKILL.md	Documents authoring/applying checks and references examples.
src/databricks/labs/dqx/resources/skills/dqx/checks/row-level/SKILL.md	Adds row-level checks reference + examples index.
src/databricks/labs/dqx/resources/skills/dqx/checks/dataset-level/SKILL.md	Adds dataset-level checks reference + examples index.
src/databricks/labs/dqx/resources/skills/dqx/checks/custom/SKILL.md	Adds custom checks guidance + examples index.
src/databricks/labs/dqx/resources/skills/dqx/profiling/SKILL.md	Adds profiling skill content and runnable example references.
src/databricks/labs/dqx/resources/skills/dqx/checks/examples/*.py	Adds runnable example scripts to be deployed alongside skills.
src/databricks/labs/dqx/init.py	Exports `install_skills` from the package root.
docs/dqx/docs/reference/benchmarks.mdx	Updates benchmark documentation tables with new entries.
demo.ipynb	Adds an executable demo notebook showing `dqx.install_skills()` usage.

Comments suppressed due to low confidence (6)

src/databricks/labs/dqx/skills.py:1

Converting importlib.resources.files(...) to a filesystem Path via Path(str(...)) will break when the package is installed as a zip/pex/egg (resources aren’t guaranteed to exist on disk), and rglob/read_bytes won’t work reliably. Keep the object as an importlib.resources.abc.Traversable and either (a) recursively walk it via iterdir() and use Traversable.read_bytes(), or (b) use importlib.resources.as_file() around the directory Traversable to obtain a temporary real path before doing rglob().
src/databricks/labs/dqx/skills.py:1
Converting importlib.resources.files(...) to a filesystem Path via Path(str(...)) will break when the package is installed as a zip/pex/egg (resources aren’t guaranteed to exist on disk), and rglob/read_bytes won’t work reliably. Keep the object as an importlib.resources.abc.Traversable and either (a) recursively walk it via iterdir() and use Traversable.read_bytes(), or (b) use importlib.resources.as_file() around the directory Traversable to obtain a temporary real path before doing rglob().
src/databricks/labs/dqx/skills.py:1
ws.workspace.mkdirs(target_dir) is invoked for every file, which can add a lot of redundant API calls for larger skill trees. Track which directories have already been created (e.g., a set[str]) and only call mkdirs the first time each directory is encountered.
src/databricks/labs/dqx/skills.py:1
ws.workspace.mkdirs(target_dir) is invoked for every file, which can add a lot of redundant API calls for larger skill trees. Track which directories have already been created (e.g., a set[str]) and only call mkdirs the first time each directory is encountered.
src/databricks/labs/dqx/skills.py:1
install_skills() adds new behavior that is sensitive to environment (current user lookup, resource discovery, workspace upload). Please add unit tests that mock WorkspaceClient to verify: (1) default folder resolution uses the current user, (2) the correct target paths are computed for nested resources, and (3) uploads are invoked with overwrite enabled.
src/databricks/labs/dqx/skills.py:1
install_skills() adds new behavior that is sensitive to environment (current user lookup, resource discovery, workspace upload). Please add unit tests that mock WorkspaceClient to verify: (1) default folder resolution uses the current user, (2) the correct target paths are computed for nested resources, and (3) uploads are invoked with overwrite enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T16:48:05Z

src/databricks/labs/dqx/__init__.py

+
+__all__ = ["install_skills"]


Defining __all__ as ["install_skills"] changes from databricks.labs.dqx import * behavior to only export install_skills, which can be an unintended breaking change for users relying on star-imports. Consider removing __all__ entirely, or appending to an existing/exported list (e.g., include previously intended public names) rather than restricting exports to only this symbol.

Suggested change

__all__ = ["install_skills"]

Copilot · 2026-02-06T16:48:06Z

demo.ipynb

+      "execution_count": 2,
+      "id": "c234d4f6",
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "/Users/grzegorz.rusin/repos/dqx/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+            "  from .autonotebook import tqdm as notebook_tqdm\n"
+          ]
+        }
+      ],


The committed notebook output includes a local filesystem path and a user email address, which are sensitive and typically should not be checked in. Clear notebook outputs before committing (and consider removing machine-specific setup like sys.path.insert(...) or replacing it with a more portable approach) so the repo doesn’t leak developer environment details.

Suggested change

"execution_count": 2,

"id": "c234d4f6",

"metadata": {},

"outputs": [

{

"name": "stderr",

"output_type": "stream",

"text": [

"/Users/grzegorz.rusin/repos/dqx/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",

" from .autonotebook import tqdm as notebook_tqdm\n"

]

}

],

"execution_count": null,

"id": "c234d4f6",

"metadata": {},

"outputs": [],

Copilot · 2026-02-06T16:48:06Z

demo.ipynb

+          "output_type": "stream",
+          "text": [
+            "\u001b[90m12:38:26\u001b[0m \u001b[1m\u001b[32m INFO\u001b[0m \u001b[1m[databricks.sdk] Using Databricks Metadata Service authentication\u001b[0m\n",
+            "\u001b[90m12:38:27\u001b[0m \u001b[1m\u001b[32m INFO\u001b[0m \u001b[1m[d.l.dqx.skills] Deploying skills to: /Users/grzegorz.rusin@databricks.com/.assistant/skills/dqx\u001b[0m\n",


The committed notebook output includes a local filesystem path and a user email address, which are sensitive and typically should not be checked in. Clear notebook outputs before committing (and consider removing machine-specific setup like sys.path.insert(...) or replacing it with a more portable approach) so the repo doesn’t leak developer environment details.

mwojtyczka

This is really cool!

Feedback:

Please sign the commits.
We need to reduce the maintenance effort. All the examples should already be available in the docs. We need a better way to manage this to avoid duplication. E.g. store examples in one place and source it for both docs and skills from there; document the examples in each check function. The examples on how to use DQX are also already in the docs. We need to centralize this.

grusin-db added 3 commits February 6, 2026 12:00

skilz

bc60c27

skills

11eeb35

examples

2344734

grusin-db requested a review from a team as a code owner February 6, 2026 13:31

grusin-db requested review from tombonfert and removed request for a team February 6, 2026 13:31

grusin-db temporarily deployed to tool February 6, 2026 13:31 — with GitHub Actions Inactive

grusin-db had a problem deploying to tool February 6, 2026 13:31 — with GitHub Actions Failure

Add pytest-benchmark performance baseline

9f67447

grusin-db mentioned this pull request Feb 6, 2026

Are skills for databricks labs project welcome here? databricks-solutions/ai-dev-kit#60

Open

mwojtyczka requested a review from Copilot February 6, 2026 16:43

Copilot AI reviewed Feb 6, 2026

View reviewed changes

mwojtyczka requested changes Feb 7, 2026

View reviewed changes

mwojtyczka changed the title ~~F/agent skills~~ Agent skills for Databricks Assistant Feb 8, 2026

Merge branch 'main' into f/agent_skills

645705e

grusin-db temporarily deployed to tool February 9, 2026 06:52 — with GitHub Actions Inactive

grusin-db had a problem deploying to tool February 9, 2026 06:52 — with GitHub Actions Error

grusin-db had a problem deploying to tool February 9, 2026 06:52 — with GitHub Actions Failure

grusin-db added 2 commits February 9, 2026 07:53

revert

b2420c7

trash

9adbb93

grusin-db had a problem deploying to tool February 9, 2026 06:54 — with GitHub Actions Error

AGENTS instructions

1adf9be

grusin-db had a problem deploying to tool February 9, 2026 07:09 — with GitHub Actions Error

grusin-db added 2 commits February 9, 2026 08:35

pass 1

79cc77b

add geo

b1f6ca9

grusin-db temporarily deployed to tool February 9, 2026 07:53 — with GitHub Actions Inactive

grusin-db had a problem deploying to tool February 9, 2026 08:19 — with GitHub Actions Failure

grusin-db temporarily deployed to tool February 9, 2026 08:19 — with GitHub Actions Inactive

grusin-db had a problem deploying to tool February 9, 2026 08:19 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent skills for Databricks Assistant#1025

Agent skills for Databricks Assistant#1025
grusin-db wants to merge 10 commits intomainfrom
f/agent_skills

grusin-db commented Feb 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

github-actions bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

mwojtyczka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

grusin-db commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR: Add install_skills() -- deploy DQX agent skills to Databricks Assistant

Key Changes

1. New install_skills() command (src/databricks/labs/dqx/skills.py)

2. Bundled skill content (src/databricks/labs/dqx/resources/skills/)

3. Runnable examples (checks/examples/)

4. Package wiring (__init__.py, resources/__init__.py)

⚠️ Breaking Changes

Examples

Databricks AI Assistant with skills loaded

Unskilled Databricks AI Assistant

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

github-actions bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

mwojtyczka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

grusin-db commented Feb 6, 2026 •

edited

Loading

PR: Add `install_skills()` -- deploy DQX agent skills to Databricks Assistant

1. New `install_skills()` command (`src/databricks/labs/dqx/skills.py`)

2. Bundled skill content (`src/databricks/labs/dqx/resources/skills/`)

3. Runnable examples (`checks/examples/`)

4. Package wiring (`init.py`, `resources/init.py`)

github-actions bot commented Feb 6, 2026 •

edited

Loading

codecov bot commented Feb 6, 2026 •

edited

Loading