-
Notifications
You must be signed in to change notification settings - Fork 506
refactor(java): arrow schema should be returned by Fragment.mergeColumns #5471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
|
@jackye1995 This PR is ready, could you please review it? Thank you. |
| public class FragmentMergeResult { | ||
| private final FragmentMetadata fragmentMetadata; | ||
| private final LanceSchema schema; | ||
| private final Schema schema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this.
I wonder if we should fix the LanceSchema converting issue since the problem occurs there. Is there any blocking pointing that we could not use LanceSchema here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for comments.
I think LanceSchema should be fixed. It is another thing, we could submit another PR to fix it.
For Fragment.mergeColumns, it is sensible to return an Arrow schema, for three reasons:
- The conversion from Lance schema to Arrow schema is already correctly implemented in Rust, so we can just reuse it like the Dataset.getSchema.
- The Merge transaction itself expects an Arrow schema.
- Dataset's public methods use Arrow for data operation (read/write) . So Arrow schema can keep consistency with other methods.
majin1102
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I wonder if we could fix LanceSchema convering issue there?
a25c1c9 to
5024ee2
Compare
|
This PR is duplicated with #5509 , close it. |
The Fragment.mergeColumns's returned schema will be used in org.lance.operation.Merge which needs a Arrow Schema. So, Fragment.mergeColumn returns a Arrow Schema can avoid the convert from Lance Schema to Arrow Schema in Java. Like Dataset.getSchema, the convert is implemented in Rust.
Fix the issue Bug: Failed to add column with backfill for FixedSizeList lance-spark#138