Avoid always adding columns as optional on schema evolution #14827
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When performing schema evolution (adding a new field), the code currently calls
addColumn()→addInternalColumn(), which always marks the new column asOPTIONAL(i.e. nullable = true), regardless of what the Connect schema says.This behaviour is inconsistent with table creation:
schema.isOptional()(and the related configuration) to decide whether a column should beREQUIREDorOPTIONAL.OPTIONAL.As a result:
OPTIONAL→ different physical schema.Solution
This PR aligns schema-evolution nullability handling with the logic used during initial table creation.
Now the same code path (respecting
ConnectSchema.isOptional()and the configured default for required fields) is used in both cases, eliminating the dual behaviour and guaranteeing that the resulting Parquet files have identical column nullability no matter whether the field was present at creation time or added later via schema evolution.Result
OPTIONALcolumns when the source schema says the field is required