Conversation
|
All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits |
|
✅ 547/547 passed, 61 skipped, 4h59m2s total Running from acceptance #3892 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1025 +/- ##
==========================================
- Coverage 91.15% 88.81% -2.34%
==========================================
Files 64 65 +1
Lines 6672 6698 +26
==========================================
- Hits 6082 5949 -133
- Misses 590 749 +159 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a new install_skills() helper to deploy bundled DQX “skills” (SKILL.md + runnable examples) into a Databricks workspace folder for Databricks Assistant discovery.
Changes:
- Introduces
src/databricks/labs/dqx/skills.pywithinstall_skills()that uploads package-bundled skill content to the workspace. - Bundles a skills tree under
src/databricks/labs/dqx/resources/skills/including SKILL.md docs and runnable example scripts. - Updates exports/docs/perf artifacts to reflect new skills/examples and benchmark additions.
Reviewed changes
Copilot reviewed 32 out of 33 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/perf/.benchmarks/baseline.json | Updates stored benchmark baselines to include new benchmark cases. |
| src/databricks/labs/dqx/skills.py | Adds install_skills() implementation that discovers packaged resources and uploads them to Workspace. |
| src/databricks/labs/dqx/resources/skills/dqx/SKILL.md | Root skill entrypoint for DQX assistant discovery and routing to sub-skills. |
| src/databricks/labs/dqx/resources/skills/dqx/checks/SKILL.md | Documents authoring/applying checks and references examples. |
| src/databricks/labs/dqx/resources/skills/dqx/checks/row-level/SKILL.md | Adds row-level checks reference + examples index. |
| src/databricks/labs/dqx/resources/skills/dqx/checks/dataset-level/SKILL.md | Adds dataset-level checks reference + examples index. |
| src/databricks/labs/dqx/resources/skills/dqx/checks/custom/SKILL.md | Adds custom checks guidance + examples index. |
| src/databricks/labs/dqx/resources/skills/dqx/profiling/SKILL.md | Adds profiling skill content and runnable example references. |
| src/databricks/labs/dqx/resources/skills/dqx/checks/examples/*.py | Adds runnable example scripts to be deployed alongside skills. |
| src/databricks/labs/dqx/init.py | Exports install_skills from the package root. |
| docs/dqx/docs/reference/benchmarks.mdx | Updates benchmark documentation tables with new entries. |
| demo.ipynb | Adds an executable demo notebook showing dqx.install_skills() usage. |
Comments suppressed due to low confidence (6)
src/databricks/labs/dqx/skills.py:1
- Converting
importlib.resources.files(...)to a filesystemPathviaPath(str(...))will break when the package is installed as a zip/pex/egg (resources aren’t guaranteed to exist on disk), andrglob/read_byteswon’t work reliably. Keep the object as animportlib.resources.abc.Traversableand either (a) recursively walk it viaiterdir()and useTraversable.read_bytes(), or (b) useimportlib.resources.as_file()around the directory Traversable to obtain a temporary real path before doingrglob().
src/databricks/labs/dqx/skills.py:1 - Converting
importlib.resources.files(...)to a filesystemPathviaPath(str(...))will break when the package is installed as a zip/pex/egg (resources aren’t guaranteed to exist on disk), andrglob/read_byteswon’t work reliably. Keep the object as animportlib.resources.abc.Traversableand either (a) recursively walk it viaiterdir()and useTraversable.read_bytes(), or (b) useimportlib.resources.as_file()around the directory Traversable to obtain a temporary real path before doingrglob().
src/databricks/labs/dqx/skills.py:1 ws.workspace.mkdirs(target_dir)is invoked for every file, which can add a lot of redundant API calls for larger skill trees. Track which directories have already been created (e.g., aset[str]) and only callmkdirsthe first time each directory is encountered.
src/databricks/labs/dqx/skills.py:1ws.workspace.mkdirs(target_dir)is invoked for every file, which can add a lot of redundant API calls for larger skill trees. Track which directories have already been created (e.g., aset[str]) and only callmkdirsthe first time each directory is encountered.
src/databricks/labs/dqx/skills.py:1install_skills()adds new behavior that is sensitive to environment (current user lookup, resource discovery, workspace upload). Please add unit tests that mockWorkspaceClientto verify: (1) default folder resolution uses the current user, (2) the correct target paths are computed for nested resources, and (3) uploads are invoked with overwrite enabled.
src/databricks/labs/dqx/skills.py:1install_skills()adds new behavior that is sensitive to environment (current user lookup, resource discovery, workspace upload). Please add unit tests that mockWorkspaceClientto verify: (1) default folder resolution uses the current user, (2) the correct target paths are computed for nested resources, and (3) uploads are invoked with overwrite enabled.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/databricks/labs/dqx/__init__.py
Outdated
|
|
||
| __all__ = ["install_skills"] |
There was a problem hiding this comment.
Defining __all__ as ["install_skills"] changes from databricks.labs.dqx import * behavior to only export install_skills, which can be an unintended breaking change for users relying on star-imports. Consider removing __all__ entirely, or appending to an existing/exported list (e.g., include previously intended public names) rather than restricting exports to only this symbol.
| __all__ = ["install_skills"] |
demo.ipynb
Outdated
| "execution_count": 2, | ||
| "id": "c234d4f6", | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "name": "stderr", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "/Users/grzegorz.rusin/repos/dqx/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", | ||
| " from .autonotebook import tqdm as notebook_tqdm\n" | ||
| ] | ||
| } | ||
| ], |
There was a problem hiding this comment.
The committed notebook output includes a local filesystem path and a user email address, which are sensitive and typically should not be checked in. Clear notebook outputs before committing (and consider removing machine-specific setup like sys.path.insert(...) or replacing it with a more portable approach) so the repo doesn’t leak developer environment details.
| "execution_count": 2, | |
| "id": "c234d4f6", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stderr", | |
| "output_type": "stream", | |
| "text": [ | |
| "/Users/grzegorz.rusin/repos/dqx/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", | |
| " from .autonotebook import tqdm as notebook_tqdm\n" | |
| ] | |
| } | |
| ], | |
| "execution_count": null, | |
| "id": "c234d4f6", | |
| "metadata": {}, | |
| "outputs": [], |
demo.ipynb
Outdated
| "output_type": "stream", | ||
| "text": [ | ||
| "\u001b[90m12:38:26\u001b[0m \u001b[1m\u001b[32m INFO\u001b[0m \u001b[1m[databricks.sdk] Using Databricks Metadata Service authentication\u001b[0m\n", | ||
| "\u001b[90m12:38:27\u001b[0m \u001b[1m\u001b[32m INFO\u001b[0m \u001b[1m[d.l.dqx.skills] Deploying skills to: /Users/grzegorz.rusin@databricks.com/.assistant/skills/dqx\u001b[0m\n", |
There was a problem hiding this comment.
The committed notebook output includes a local filesystem path and a user email address, which are sensitive and typically should not be checked in. Clear notebook outputs before committing (and consider removing machine-specific setup like sys.path.insert(...) or replacing it with a more portable approach) so the repo doesn’t leak developer environment details.
mwojtyczka
left a comment
There was a problem hiding this comment.
This is really cool!
Feedback:
- Please sign the commits.
- We need to reduce the maintenance effort. All the examples should already be available in the docs. We need a better way to manage this to avoid duplication. E.g. store examples in one place and source it for both docs and skills from there; document the examples in each check function. The examples on how to use DQX are also already in the docs. We need to centralize this.
PR: Add
install_skills()-- deploy DQX agent skills to Databricks AssistantBranch:
f/agent_skills→main| 3 commits | +2109 linesKey Changes
1. New
install_skills()command (src/databricks/labs/dqx/skills.py)One-liner to deploy DQX agent skills into a Databricks workspace so that Databricks Assistant can use them.
Reads bundled SKILL.md files and example scripts from package resources via
importlib.resources, then uploads them withWorkspaceClient.workspace.upload().2. Bundled skill content (
src/databricks/labs/dqx/resources/skills/)Six SKILL.md files covering the full DQX surface, structured for the assistant's skill discovery:
dqx/SKILL.mdchecks/SKILL.mdchecks/row-level/SKILL.mdchecks/dataset-level/SKILL.mdchecks/custom/SKILL.mdprofiling/SKILL.mdEach SKILL.md includes a parameter table at the top that explicitly marks required vs optional arguments to prevent the assistant from generating invalid check definitions (e.g. calling
is_in_rangewithout bothmin_limitandmax_limit).3. Runnable examples (
checks/examples/)21 self-contained Python scripts (one per check category) bundled as package resources and deployed alongside skills:
01–09: row-level checks (null, list, comparison, range, regex, datetime, network, SQL, complex types)10–16: dataset-level checks (unique, aggregation, FK, compare, freshness, schema, SQL query)17–19: custom checks (SQL window, Python row, Python dataset)20–21: profilingAll SKILL.md files cross-reference relevant examples.
4. Package wiring (
__init__.py,resources/__init__.py)install_skillsis exported fromdatabricks.labs.dqx.__init__, so both import styles work:No
pyproject.tomlbuild changes needed -- hatchling already includes non-Python files undersrc/.None. This is purely additive.
Examples
Databricks AI Assistant with skills loaded
...
Unskilled Databricks AI Assistant