QoL: Destructive schema sync after manual column dropping #2909

anuunchin · 2025-07-21T07:35:02Z

Description

This PR adds a new pipeline function that syncs the dlt schema with the destination (not vice versa) by removing a column from the schema if that column has been manually deleted in the destination.

The motivation behind this is that rather than offering a cli command that drops columns - where we need to have separate drop_columns functions due to dialect differences and thus adds additional maintenance overhead - we delegate the dropping part to the user and instead allow them to sync the dlt schema in those scenarios.

Related PRs:

#2754

Further:

This should be extended to table drop syncs as well.

Note:

This is essentially solving the problem when the user manually drops things in the destination and the dlt pipeline breaks.

netlify · 2025-07-21T07:35:06Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`d2ad80e`
🔍 Latest deploy log	https://app.netlify.com/projects/dlt-hub-docs/deploys/68bac554550c830008d938d7

dlt/common/destination/client.py

dlt/destinations/impl/dummy/dummy.py

dlt/destinations/impl/filesystem/filesystem.py

rudolfix

This is very good idea but we need to approach it in more systematic way:

(almost) all of our destinations have

def get_storage_table(self, table_name: str) -> Tuple[str, TTableSchemaColumns]:

and/or

def get_storage_tables(
        self, table_names: Iterable[str]
    ) -> Iterable[Tuple[str, TTableSchemaColumns]]:

implemented. this will reflect the storage to get table schema out of it. you can use it to compare with the pipeline schema.

let's formalize it: add mixin class like WithTableReflection in the same manner WithStateSync is done. get_storage_tables is the more general method so you can add only this to the mixin
Now add this mixing to all JobClientBase implementations for which you want to support our new sync schema.

When the above is done we are able to actually compute schema diff.

Top level interface:

we have sync_schema that will do a regular schema migration (add missing columns and tables in the destination`
we need another method which is the reverse: it will delete columns and tables in the schema that are not present in the destination and then do the schema sync above
the method above should have a dry run mode - where we do not change the pipeline schema and we do not sync it
it should make sure if destination_client() implements WithTableReflection before continuing
it should allow to select tables to be affected

when this is done we can think about extending cli ie dlt pipeline <name> schema command

dlt/common/destination/client.py

dlt/destinations/impl/filesystem/filesystem.py

dlt/common/destination/client.py

rudolfix

some changes needed

dlt/common/schema/schema.py

dlt/common/destination/client.py

dlt/destinations/impl/bigquery/bigquery.py

dlt/common/destination/client.py

dlt/destinations/impl/filesystem/filesystem.py

dlt/common/destination/client.py

dlt/pipeline/pipeline.py

dlt/destinations/impl/filesystem/filesystem.py

rudolfix

this is really good! you need to resolve conflicts and maybe simplify code that deals with nested tables.

maybe we could document this? not sure where in the documentation it should go

dlt/common/schema/schema.py

dlt/destinations/impl/filesystem/filesystem.py

dlt/destinations/utils.py

cloudflare-workers-and-pages · 2025-10-21T10:02:23Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	docs	`fee497d`	Commit Preview URL Branch Preview URL	Nov 25 2025, 12:41 PM

rudolfix

this is really good and is doing a lot of refactoring which we need. but we need even more :)

also when this is finished we need two new cli commands that will use it:
dlt pipeline fruitshop schema [foo] sync-to-destination (attach to pipeline, use old sync method)
dlt pipeline fruitshop schema [foo] sync-from-destination (the reverse operation, with optional sync back to destination)

dlt pipeline fruitshop schema [foo] should still show schema (you can add "show" as well)

heh a lot of work... sorry for that
is dry run possible for the destructive sync?

dlt/common/destination/client.py

dlt/destinations/utils.py

…_schema

anuunchin force-pushed the feat/1153-drop-column-sync branch 2 times, most recently from c280794 to c37c422 Compare July 21, 2025 07:56

anuunchin commented Jul 21, 2025

View reviewed changes

dlt/common/destination/client.py Outdated Show resolved Hide resolved

anuunchin commented Jul 21, 2025

View reviewed changes

dlt/destinations/impl/dummy/dummy.py Outdated Show resolved Hide resolved

anuunchin commented Jul 21, 2025

View reviewed changes

dlt/destinations/impl/filesystem/filesystem.py Outdated Show resolved Hide resolved

anuunchin force-pushed the feat/1153-drop-column-sync branch from c37c422 to 598ca5b Compare July 21, 2025 08:54

anuunchin self-assigned this Jul 21, 2025

anuunchin requested a review from rudolfix July 21, 2025 09:58

rudolfix requested changes Jul 21, 2025

View reviewed changes

dlt/common/destination/client.py Outdated Show resolved Hide resolved

dlt/destinations/impl/filesystem/filesystem.py Outdated Show resolved Hide resolved

dlt/common/destination/client.py Outdated Show resolved Hide resolved

anuunchin force-pushed the feat/1153-drop-column-sync branch 3 times, most recently from c256d51 to d4aadb2 Compare July 28, 2025 07:53

anuunchin requested a review from rudolfix July 28, 2025 09:07

anuunchin force-pushed the feat/1153-drop-column-sync branch from d4aadb2 to ab715ff Compare July 28, 2025 09:28

anuunchin force-pushed the feat/1153-drop-column-sync branch from ab715ff to 17651e2 Compare August 5, 2025 07:08

rudolfix requested changes Aug 18, 2025

View reviewed changes

anuunchin force-pushed the feat/1153-drop-column-sync branch 3 times, most recently from 354680c to 85448dc Compare September 2, 2025 07:43

anuunchin force-pushed the feat/1153-drop-column-sync branch 2 times, most recently from b1c3f3f to d2ad80e Compare September 5, 2025 11:11

anuunchin commented Sep 5, 2025

View reviewed changes

dlt/destinations/impl/filesystem/filesystem.py Show resolved Hide resolved

anuunchin requested a review from rudolfix September 9, 2025 11:23

anuunchin mentioned this pull request Sep 16, 2025

QoL: Drop column #2754

Closed

rudolfix requested changes Oct 6, 2025

View reviewed changes

dlt/common/schema/schema.py Show resolved Hide resolved

dlt/destinations/impl/filesystem/filesystem.py Show resolved Hide resolved

dlt/destinations/utils.py Outdated Show resolved Hide resolved

dlt/destinations/utils.py Show resolved Hide resolved

anuunchin force-pushed the feat/1153-drop-column-sync branch from d2ad80e to b340618 Compare October 21, 2025 10:02

anuunchin force-pushed the feat/1153-drop-column-sync branch from 739025e to 2f85143 Compare October 23, 2025 07:20

anuunchin requested a review from rudolfix October 28, 2025 09:11

rudolfix reviewed Oct 28, 2025

View reviewed changes

sh-rp force-pushed the devel branch from 6bbc3e1 to df8ccec Compare October 28, 2025 16:29

anuunchin force-pushed the feat/1153-drop-column-sync branch 2 times, most recently from 0504401 to 9471bc4 Compare November 25, 2025 10:52

anuunchin added 13 commits November 25, 2025 11:52

Initial impl of sync_schema_destructively

c837342

Formalising dlt schema sync

b7a1f08

Unnecessary inheritance removed, functions moved

c51af18

Duplicate function removed, dummy implements empty update_from_stored…

c5e09af

…_schema

sync_schema deprecated, storage initialization check

1e6dc0c

Unnecessary abstract class impls removed, no table reflection exception

64704ae

Better docstrings, var names

31d5804

validate_table_drop_list func implemented

9e59d30

Test fix, docstring fix

db916ce

Renaming, improved docstring

24f7d7c

is_dlt_entity method on schema

ac9e2db

Func renaming, args more granular

88d4752

get_removed_table_columns in schema

73dc832

anuunchin force-pushed the feat/1153-drop-column-sync branch 6 times, most recently from 934764a to cbd5a93 Compare November 25, 2025 12:30

Test, minor adjustments

fee497d

anuunchin force-pushed the feat/1153-drop-column-sync branch from cbd5a93 to fee497d Compare November 25, 2025 12:35

anuunchin requested a review from rudolfix November 25, 2025 12:39

QoL: Destructive schema sync after manual column dropping #2909

Are you sure you want to change the base?

QoL: Destructive schema sync after manual column dropping #2909

Uh oh!

Conversation

anuunchin commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related PRs:

Further:

Note:

Uh oh!

netlify bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for dlt-hub-docs canceled.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloudflare-workers-and-pages bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anuunchin commented Jul 21, 2025 •

edited

Loading

netlify bot commented Jul 21, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Oct 21, 2025 •

edited

Loading