-
Notifications
You must be signed in to change notification settings - Fork 13
feat: Recover from reading incompatible schema metadata if validation is allowed #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Recover from reading incompatible schema metadata if validation is allowed #231
Conversation
…rquet error handling Co-authored-by: MoritzPotthoffQC <[email protected]>
Co-authored-by: MoritzPotthoffQC <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #231 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 53 53
Lines 3019 3053 +34
=========================================
+ Hits 3019 3053 +34 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
AndreasAlbertQC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MoritzPotthoffQC ! Small questions
…torage backends Co-authored-by: MoritzPotthoffQC <[email protected]>
…rquet_schema_json_fallback_corrupt Co-authored-by: MoritzPotthoffQC <[email protected]>
Motivation
When serialization formats change (e.g.,
primary_keysvsprimary_key), previously serialized schemas/collections become unreadable. Schema deserialization already handles this via astrictparameter, but collections did not. Additionally, when reading serialized data, using non-strict schema deserialization in all cases that would allow validation to be run allows to recover from unreadable metadata where it is not needed.Fixes #230
Changes
strictparameter todeserialize_collection: Mirrors existingdeserialize_schemabehavior—whenstrict=False, returnsNoneon deserialization errors instead of raising exceptionsstrictthrough deserialization chain: Updated_deserialize_typesto accept and forward the parameterscan_parquet/read_parquetvalidation modes: BothCollection._readandSchema._validate_if_needednow passstrict=Falsewhenvalidationis "allow", "skip" or "warn", allowing automatic fallback to validation when old formats are detectedDeserializationErrorexception: Created a new exception class that is raised when deserialization fails withstrict=True, providing a clear and consistent error type for both schema and collection deserialization failurestest_read_write_old_metadata_contentsfor collections to useTESTERSparametrization instead of being Parquet-specificset_metadatatoCollectionStorageTester: Implemented abstract method with Parquet and Delta backend implementations to support testing metadata manipulation across storage backendsset_metadatapattern: Updatedtest_read_write_parquet_schema_json_fallback_corruptto usewrite_untyped+set_metadatainstead of passingmetadatakwarg which was being ignored✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.