Skip to content

Conversation

@kumarpritam863
Copy link
Contributor

Problem

When performing schema evolution (adding a new field), the code currently calls addColumn()addInternalColumn(), which always marks the new column as OPTIONAL (i.e. nullable = true), regardless of what the Connect schema says.

This behaviour is inconsistent with table creation:

  • On initial table creation we correctly respect schema.isOptional() (and the related configuration) to decide whether a column should be REQUIRED or OPTIONAL.
  • On schema evolution we ignore that and force the new column to be OPTIONAL.

As a result:

  • If you create the table directly with schema v2 (which already contains the new field), the column is created with the correct nullability.
  • If you create the table with v1 and later evolve to v2, the same column is added as OPTIONAL → different physical schema.

Solution

This PR aligns schema-evolution nullability handling with the logic used during initial table creation.

Now the same code path (respecting ConnectSchema.isOptional() and the configured default for required fields) is used in both cases, eliminating the dual behaviour and guaranteeing that the resulting Parquet files have identical column nullability no matter whether the field was present at creation time or added later via schema evolution.

Result

  • Consistent Parquet schema across creation and evolution paths
  • New columns respect the nullability declared in the Connect schema
  • No more surprise OPTIONAL columns when the source schema says the field is required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant