Skip to content

Conversation

@fangbo
Copy link
Contributor

@fangbo fangbo commented Dec 15, 2025

  1. The Fragment.mergeColumns's returned schema will be used in org.lance.operation.Merge which needs a Arrow Schema. So, Fragment.mergeColumn returns a Arrow Schema can avoid the convert from Lance Schema to Arrow Schema in Java. Like Dataset.getSchema, the convert is implemented in Rust.

  2. Fix the issue Bug: Failed to add column with backfill for FixedSizeList lance-spark#138

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions github-actions bot added the java label Dec 15, 2025
@github-actions
Copy link
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@fangbo fangbo changed the title refactor: Fragment.mergeColumns returns Arrow Schema refactor: Fragment.mergeColumns should returns arrow Schema Dec 15, 2025
@fangbo fangbo changed the title refactor: Fragment.mergeColumns should returns arrow Schema refactor: Fragment.mergeColumns should return arrow Schema Dec 15, 2025
@fangbo fangbo changed the title refactor: Fragment.mergeColumns should return arrow Schema refactor: arrow schema should be returned by Fragment.mergeColumns Dec 15, 2025
@fangbo
Copy link
Contributor Author

fangbo commented Dec 15, 2025

@jackye1995 This PR is ready, could you please review it? Thank you.

public class FragmentMergeResult {
private final FragmentMetadata fragmentMetadata;
private final LanceSchema schema;
private final Schema schema;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this.

I wonder if we should fix the LanceSchema converting issue since the problem occurs there. Is there any blocking pointing that we could not use LanceSchema here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for comments.

I think LanceSchema should be fixed. It is another thing, we could submit another PR to fix it.

For Fragment.mergeColumns, it is sensible to return an Arrow schema, for three reasons:

  1. The conversion from Lance schema to Arrow schema is already correctly implemented in Rust, so we can just reuse it like the Dataset.getSchema.
  2. The Merge transaction itself expects an Arrow schema.
  3. Dataset's public methods use Arrow for data operation (read/write) . So Arrow schema can keep consistency with other methods.

Copy link
Contributor

@majin1102 majin1102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I wonder if we could fix LanceSchema convering issue there?

@fangbo fangbo changed the title refactor: arrow schema should be returned by Fragment.mergeColumns refactor(java): arrow schema should be returned by Fragment.mergeColumns Dec 17, 2025
@fangbo
Copy link
Contributor Author

fangbo commented Dec 18, 2025

This PR is duplicated with #5509 , close it.

@fangbo fangbo closed this Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants