Skip to content

Conversation

@anuunchin
Copy link
Contributor

@anuunchin anuunchin commented Jul 21, 2025

Description

This PR adds a new pipeline function that syncs the dlt schema with the destination (not vice versa) by removing a column from the schema if that column has been manually deleted in the destination.

The motivation behind this is that rather than offering a cli command that drops columns - where we need to have separate drop_columns functions due to dialect differences and thus adds additional maintenance overhead - we delegate the dropping part to the user and instead allow them to sync the dlt schema in those scenarios.

Related PRs:

#2754

Further:

This should be extended to table drop syncs as well.

Note:

This is essentially solving the problem when the user manually drops things in the destination and the dlt pipeline breaks.

@netlify
Copy link

netlify bot commented Jul 21, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit d2ad80e
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/68bac554550c830008d938d7

@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch 2 times, most recently from c280794 to c37c422 Compare July 21, 2025 07:56
@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch from c37c422 to 598ca5b Compare July 21, 2025 08:54
@anuunchin anuunchin self-assigned this Jul 21, 2025
@anuunchin anuunchin requested a review from rudolfix July 21, 2025 09:58
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good idea but we need to approach it in more systematic way:

  1. (almost) all of our destinations have
def get_storage_table(self, table_name: str) -> Tuple[str, TTableSchemaColumns]:

and/or

def get_storage_tables(
        self, table_names: Iterable[str]
    ) -> Iterable[Tuple[str, TTableSchemaColumns]]: 

implemented. this will reflect the storage to get table schema out of it. you can use it to compare with the pipeline schema.

  1. let's formalize it: add mixin class like WithTableReflection in the same manner WithStateSync is done. get_storage_tables is the more general method so you can add only this to the mixin
  2. Now add this mixing to all JobClientBase implementations for which you want to support our new sync schema.

When the above is done we are able to actually compute schema diff.

Top level interface:

  1. we have sync_schema that will do a regular schema migration (add missing columns and tables in the destination`
  2. we need another method which is the reverse: it will delete columns and tables in the schema that are not present in the destination and then do the schema sync above
  3. the method above should have a dry run mode - where we do not change the pipeline schema and we do not sync it
  4. it should make sure if destination_client() implements WithTableReflection before continuing
  5. it should allow to select tables to be affected

when this is done we can think about extending cli ie dlt pipeline <name> schema command

@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch 3 times, most recently from c256d51 to d4aadb2 Compare July 28, 2025 07:53
@anuunchin anuunchin requested a review from rudolfix July 28, 2025 09:07
@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch from d4aadb2 to ab715ff Compare July 28, 2025 09:28
@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch from ab715ff to 17651e2 Compare August 5, 2025 07:08
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some changes needed

@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch 3 times, most recently from 354680c to 85448dc Compare September 2, 2025 07:43
@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch 2 times, most recently from b1c3f3f to d2ad80e Compare September 5, 2025 11:11
@anuunchin anuunchin requested a review from rudolfix September 9, 2025 11:23
@anuunchin anuunchin mentioned this pull request Sep 16, 2025
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really good! you need to resolve conflicts and maybe simplify code that deals with nested tables.

maybe we could document this? not sure where in the documentation it should go

@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch from d2ad80e to b340618 Compare October 21, 2025 10:02
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Oct 21, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
docs fee497d Commit Preview URL

Branch Preview URL
Nov 25 2025, 12:41 PM

@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch from 739025e to 2f85143 Compare October 23, 2025 07:20
@anuunchin anuunchin requested a review from rudolfix October 28, 2025 09:11
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really good and is doing a lot of refactoring which we need. but we need even more :)

also when this is finished we need two new cli commands that will use it:
dlt pipeline fruitshop schema [foo] sync-to-destination (attach to pipeline, use old sync method)
dlt pipeline fruitshop schema [foo] sync-from-destination (the reverse operation, with optional sync back to destination)

dlt pipeline fruitshop schema [foo] should still show schema (you can add "show" as well)

heh a lot of work... sorry for that
is dry run possible for the destructive sync?

@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch 2 times, most recently from 0504401 to 9471bc4 Compare November 25, 2025 10:52
@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch 6 times, most recently from 934764a to cbd5a93 Compare November 25, 2025 12:30
@anuunchin anuunchin force-pushed the feat/1153-drop-column-sync branch from cbd5a93 to fee497d Compare November 25, 2025 12:35
@anuunchin anuunchin requested a review from rudolfix November 25, 2025 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants