Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 21, 2025

Motivation

When serialization formats change (e.g., primary_keys vs primary_key), previously serialized schemas/collections become unreadable. Schema deserialization already handles this via a strict parameter, but collections did not. Additionally, when reading serialized data, using non-strict schema deserialization in all cases that would allow validation to be run allows to recover from unreadable metadata where it is not needed.

Fixes #230

Changes

  • Added strict parameter to deserialize_collection: Mirrors existing deserialize_schema behavior—when strict=False, returns None on deserialization errors instead of raising exceptions
  • Propagated strict through deserialization chain: Updated _deserialize_types to accept and forward the parameter
  • Connected to scan_parquet/read_parquet validation modes: Both Collection._read and Schema._validate_if_needed now pass strict=False when validation is "allow", "skip" or "warn", allowing automatic fallback to validation when old formats are detected
  • Added DeserializationError exception: Created a new exception class that is raised when deserialization fails with strict=True, providing a clear and consistent error type for both schema and collection deserialization failures
  • Parametrized tests over storage backends: Updated test_read_write_old_metadata_contents for collections to use TESTERS parametrization instead of being Parquet-specific
  • Added set_metadata to CollectionStorageTester: Implemented abstract method with Parquet and Delta backend implementations to support testing metadata manipulation across storage backends
  • Updated tests to use set_metadata pattern: Updated test_read_write_parquet_schema_json_fallback_corrupt to use write_untyped + set_metadata instead of passing metadata kwarg which was being ignored

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits November 21, 2025 14:49
Copilot AI changed the title [WIP] Catch errors when reading serialized schemas for collections Add strict parameter to collection deserialization for backward compatibility Nov 21, 2025
Copilot finished work on behalf of MoritzPotthoffQC November 21, 2025 14:54
@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (3b82fdc) to head (f21c4b8).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #231   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           53        53           
  Lines         3019      3053   +34     
=========================================
+ Hits          3019      3053   +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MoritzPotthoffQC MoritzPotthoffQC changed the title Add strict parameter to collection deserialization for backward compatibility feat: Recover from reading incompatible schema metadata if validation can be used Nov 24, 2025
@github-actions github-actions bot added the enhancement New feature or request label Nov 24, 2025
@MoritzPotthoffQC MoritzPotthoffQC changed the title feat: Recover from reading incompatible schema metadata if validation can be used feat: Recover from reading incompatible schema metadata if validation is allowed Nov 24, 2025
@MoritzPotthoffQC MoritzPotthoffQC marked this pull request as ready for review November 24, 2025 14:04
Copy link
Collaborator

@AndreasAlbertQC AndreasAlbertQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MoritzPotthoffQC ! Small questions

…rquet_schema_json_fallback_corrupt

Co-authored-by: MoritzPotthoffQC <[email protected]>
Copilot finished work on behalf of MoritzPotthoffQC November 28, 2025 18:00
@MoritzPotthoffQC MoritzPotthoffQC marked this pull request as ready for review December 4, 2025 14:47
@MoritzPotthoffQC MoritzPotthoffQC enabled auto-merge (squash) December 5, 2025 12:04
@MoritzPotthoffQC MoritzPotthoffQC merged commit 90dd8e5 into main Dec 5, 2025
31 checks passed
@MoritzPotthoffQC MoritzPotthoffQC deleted the copilot/catch-errors-when-reading-collections branch December 5, 2025 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Catch errors when reading serialized schemas also for collections

3 participants