Skip to content

Conversation

Copy link

Copilot AI commented Nov 4, 2025

Schema.to_pyarrow_schema() was not preserving nullability constraints for nested column types. Inner fields were always marked nullable regardless of their nullable parameter.

Changes

  • Struct: Use pa.struct([fields]) instead of pa.struct({name: dtype}) to preserve per-field nullability
  • List: Use pa.large_list(field) instead of pa.large_list(dtype) to preserve inner type nullability
  • Array: Modified _pyarrow_dtype_of_shape() to use pa.list_(field, size) at the base case to preserve inner type nullability
  • Tests: Updated parametrization to use _nullable() helper and added tests for nested nullability preservation

Example

# Before: inner fields always nullable
class Schema(dy.Schema):
    struct_col = dy.Struct({
        "required": dy.String(nullable=False),
        "optional": dy.String(nullable=True),
    })

Schema.to_pyarrow_schema()
# struct_col: struct<required: large_string, optional: large_string>
#                              ^^^^^^^^^^^^ incorrectly nullable

# After: nullability preserved
# struct_col: struct<required: large_string not null, optional: large_string>
#                              ^^^^^^^^^^^^^^^^^^^^^^ correct
Original prompt

This section details on the original issue you should resolve

<issue_title>Schema.to_pyarrow_schema(cls) ignores nested nullability</issue_title>
<issue_description>Column.pyarrow_field(self, name: str) does not recurse over subfields, so the output pyarrow fields in nested types are always nullable. For example, a dataframely struct with one optional field and one required field will result in a pyarrow structure with two optional fields.</issue_description>

<agent_instructions>Fix this issue by doing the following:

  • For all nested types (list, struct, array), adjust the pyarrow_dtype method to use initializers based on pyarrow fields. Instead of simply accessing pyarrow_dtype of the nested types, construct a field by using the respective method on the column
  • Add new tests that make sure the issue is fixed, ideally extending existing @pytest.mark.parametrize decorators</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Fix nested nullability in pyarrow schema conversion Fix nested nullability in PyArrow schema generation for struct, list, and array columns Nov 4, 2025
Copilot AI requested a review from borchero November 4, 2025 20:32
Copilot finished work on behalf of borchero November 4, 2025 20:32
@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (e102cde) to head (f15f6fa).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #217   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           53        53           
  Lines         3005      3006    +1     
=========================================
+ Hits          3005      3006    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@borchero borchero marked this pull request as ready for review November 5, 2025 01:12
@borchero borchero changed the title Fix nested nullability in PyArrow schema generation for struct, list, and array columns fix: Properly set nested nullability when converting to pyarrwo Nov 5, 2025
@borchero borchero changed the title fix: Properly set nested nullability when converting to pyarrwo fix: Properly set nested nullability when converting to pyarrow Nov 5, 2025
@github-actions github-actions bot added the fix label Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Schema.to_pyarrow_schema(cls) ignores nested nullability

2 participants