Skip to content

Conversation

@jonded94
Copy link
Contributor

As far as I can see, the only reason why lance-ray depends on numpy is that it uses the array_split method from it.

IMHO, just for one function, that's a way too huge dependency to take in. In our specific case, we're unfortunately still limited to numpy < 2.0, and that lance-ray depends on numpy >= 2.0.0 is a problem for us.

I think numpy < 2.0 also contained an array_split method, so in principle one could have lifted the version restriction a bit to fix our problem, but I refactored this directly a bit to not depend on numpy at all anymore.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@jonded94 jonded94 changed the title Replace array_split method from numpy with an [more_]itertools one to make lance-ray not directly depend on ´numpy` anymore Replace array_split method from numpy with an [more_]itertools one to make lance-ray not directly depend on numpy anymore Dec 15, 2025
@github-actions
Copy link
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@jonded94 jonded94 changed the title Replace array_split method from numpy with an [more_]itertools one to make lance-ray not directly depend on numpy anymore refactor: remove numpy dependency by using itertools.batched/more_itertools.divide Dec 15, 2025
@chenghao-guo
Copy link
Collaborator

Thanks for your @jonded94 work. This is useful as numpy actually cause mismatch version with ray (as ray in python 3.11 will use numpy<2 and pandas<2, and newer python 3.12 use numpy>2), I believe your work can make lance-ray more robust without directly depending on numpy version.

I have just approved the PR run ,there were some small issues like
ruff format (you can use uv run ruff format locally to fix this)

And some small issues.

    def array_split(iterable: Iterable[T], n: int) -> list[Sequence[T]]:
E   NameError: name 'T' is not defined

Once the CI pass, I will merge this.

@chenghao-guo chenghao-guo self-requested a review December 16, 2025 03:43


if sys.version_info >= (3, 12):
from itertools import batched
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the difference when python>=3.12. Could we use more_itertools for all python version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.12 introduced itertools.batched, that's why there is this split here.

I wanted to use as many stdlib tools as possible, if they are available, and only introduce a dependency on more_itertools if people are using lance-ray when using Python < 3.12.

I don't have an opinion on whether this is nice or a bit too overkill, the way I implemented is at least one with the least amount of external dependencies.

@jonded94
Copy link
Contributor Author

Sorry for the couple of small issues, wasn't actually feeling 100% when I opened this PR 😅 Should be all fixed now.

@chenghao-guo chenghao-guo merged commit c6ba979 into lance-format:main Dec 17, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants