-
Notifications
You must be signed in to change notification settings - Fork 25
π daily merge: master β main 2025-10-27 #661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
β¦roject#57675) out of `util.py`. also adding its own `py_library` Signed-off-by: Lonnie Liu <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? I've encountered an issue where Ray sends SIGKILL to child processes (grandchild will not receive the signal) launched by a Ray actor. As a result, the subprocess cannot catch the signal to gracefully clean up its child processes. Therefore, the grandchild processes of the actor will leak. I'm glad to see ray-project#56476 by @codope, and I also built a similar solution myself. This PR adds the case where I met. @codope why not enable this feature by default? ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kai-Hsun Chen <[email protected]>
into `anyscale_job_runner`. it is only used in `anyscale_job_runner` now Signed-off-by: Lonnie Liu <[email protected]>
β¦ject#57681) it is only used in glue.py Signed-off-by: Lonnie Liu <[email protected]>
β¦project#57682) should all be using hermetic python with python 3.8 or above now Signed-off-by: Lonnie Liu <[email protected]>
make that `test_in_docker` does not depend on the entire `ray_release` library, but only depends on python files that are required for the test db to work. this removes the dependency of `cryptography` library from `ray_ci`, so that windows wheels can be built and windows tests can run again. Signed-off-by: Lonnie Liu <[email protected]>
β¦ystem reserved resources (ray-project#57653) Signed-off-by: irabbani <[email protected]> Signed-off-by: israbbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
Cleaning out plasma and the plasma client and its neighbors. The plasma client had a pimpl implementation, even though we didn't really need anything that would come with pimpl out of plasma. So just killing the separate impl class and just having the plasma client and its interface. One note about this is that it needs `shared_from_this` and the old plasma client would always contain a shared ptr to the impl, so had to refactor the raylet to use a shared ptr to the plasma client so we could keep using the `shared_from_this`. Other cleanup: - a lot of the ipc functions always returned status::ok so changed to void - some extra reserving of vectors and moving - unnecessary consts in pull manager that would prevent moves etc. --------- Signed-off-by: dayshah <[email protected]>
ray-project#57626) The test times out frequently in CI. Before this change, the test took `~40s` to run on my laptop. After the change, the test took `~15s` to run on my laptop. There also seems to be hanging related to in-order execution semantics, so for now flipping to `allow_out_of_order_exection=True`. @dayshah will add the `False` variant when he fixes the underlying issue. --------- Signed-off-by: Edward Oakes <[email protected]>
β¦oject#56853) This PR refactors the `TaskExecutionEvent` proto in two ways: - Rename the file to `events_task_lifecycle_event.proto` - Refactor the task_state from a map to an array of TaskState and timestamp. Also rename the field to `state_transitions` for consistency. This PR depends on the upstream to update their logic to consume this new schema. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Renames task execution event to task lifecycle event and changes its schema from a state map to an ordered state_transitions list, updating core, GCS, dashboard, builds, tests, and docs. > > - **Proto/API changes (breaking)** > - Rename `TaskExecutionEvent` β `TaskLifecycleEvent` and update `RayEvent.EventType` (`TASK_EXECUTION_EVENT` β `TASK_LIFECYCLE_EVENT`). > - Replace `task_state` map with `state_transitions` (list of `{state, timestamp}`) in `events_task_lifecycle_event.proto`. > - Update `events_base_event.proto` field from `task_execution_event` β `task_lifecycle_event` and imports/BUILD deps accordingly. > - **Core worker** > - Update buffer/conversion logic in `src/ray/core_worker/task_event_buffer.{h,cc}` to populate/emit `TaskLifecycleEvent` with `state_transitions`. > - **GCS** > - Update `GcsRayEventConverter` to consume `TASK_LIFECYCLE_EVENT` and convert `state_transitions` to `state_ts_ns`. > - **Dashboard/Aggregator** > - Switch exposable type defaults/env to `TASK_LIFECYCLE_EVENT` in `python/.../aggregator_agent.py`. > - **Tests** > - Adjust unit tests to new event/type and schema across core worker, GCS, and dashboard. > - **Docs** > - Update event export guide references to new lifecycle event proto. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 61507e8. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Signed-off-by: Cuong Nguyen <[email protected]>
- removing enum capability enum as it is not being used, for more details: ray-project#56707 (comment) --------- Signed-off-by: harshit <[email protected]>
β¦ default (ray-project#57623) Previously we were using `DeprecationWarning` which is silenced by default. Now this is printed: ``` >>> ray.init(local_mode=True) /Users/eoakes/code/ray/python/ray/_private/client_mode_hook.py:104: FutureWarning: `local_mode` is an experimental feature that is no longer maintained and will be removed in the near future. For debugging consider using the Ray distributed debugger. return func(*args, **kwargs) ``` --------- Signed-off-by: Edward Oakes <[email protected]>
- adding a new note about using filesystem as a broker in celery --------- Signed-off-by: harshit <[email protected]>
<!-- Thank you for contributing to Ray! π --> <!-- Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- π‘ Tip: Mark as draft if you want early feedback, or ready for review when it's complete --> ## Description <!-- Briefly describe what this PR accomplishes and why it's needed --> Improved the Ray pull request template to make it less overwhelming for contributors while giving maintainers better information for reviews and release notes. The new template has clearer sections and organized checklists that are much easier to fill out. This should encourage more contributions while making the review process smoother and release note generation more straightforward. ## Related issues <!-- Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234" --> ## Types of change - [ ] Bug fix π - [ ] New feature β¨ - [x] Enhancement π - [ ] Code refactoring π§ - [ ] Documentation update π - [ ] Chore π§Ή - [ ] Style π¨ ## Checklist **Does this PR introduce breaking changes?** - [ ] Yesβ οΈ - [x] No <!-- If yes, describe what breaks and how users should migrate --> **Testing:** - [ ] Added/updated tests for my changes - [x] Tested the changes manually - [ ] This PR is not tested β _(please explain why)_ **Code Quality:** - [x] Signed off every commit (`git commit -s`) - [x] Ran pre-commit hooks ([setup guide](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) **Documentation:** - [ ] Updated documentation (if applicable) ([contribution guide](https://docs.ray.io/en/latest/ray-contribute/docs.html)) - [ ] Added new APIs to `doc/source/` (if applicable) ## Additional context <!-- Optional: Add screenshots, examples, performance impact, breaking change details --> --------- Signed-off-by: Matthew Deng <[email protected]> Signed-off-by: matthewdeng <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
β¦epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <[email protected]> Signed-off-by: Jiang Wu <[email protected]> Co-authored-by: Nikhil G <[email protected]>
β¦ject#57702) This PR refactors the operator metrics logging tests in `test_stats.py` to improve clarity, reliability, and maintainability. - Replaced `test_op_metrics_logging` and `test_op_state_logging` with a single, more focused test: `test_executor_logs_metrics_on_operator_completion` - Uses pytest's `caplog` fixture instead of mocking the logger (more idiomatic) - Tests the core behavior (operator completion metrics logged exactly once) without depending on exact log message formatting - Eliminates reliance on helper functions and complex string matching - More descriptive test name following unit testing best practices - Reduced test code complexity while maintaining coverage of critical logging behavior Signed-off-by: Balaji Veeramani <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? as titled <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: iamjustinhsu <[email protected]>
## Why are these changes needed? This PR speeds up the Data CI pipeline by increasing parallelism and improving test distribution: 1. **Increased parallelism for parallel tests**: Bumped from 2 to 8 workers for both `data9test` and `dataltest` jobs that handle tests tagged with `data` (but not `data_non_parallel`) 2. **Added parallelism for non-parallel tests**: Added 3-way parallelism to `data9test_non_parallel` and `dataltest_non_parallel` jobs with proper worker distribution flags (`--workers` and `--worker-id`) These changes should significantly reduce CI runtime for Data tests by better utilizing available resources. ## Related issue number N/A ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [x] This PR is not tested :( --------- Signed-off-by: Balaji Veeramani <[email protected]>
temporarily soft failing on llm dependency compilation Signed-off-by: elliot-barn <[email protected]>
β¦c chaosβ¦ (ray-project#57288) Signed-off-by: joshlee <[email protected]>
Including config_name in depsets Remove build_arg_sets from config class --------- Signed-off-by: elliot-barn <[email protected]> Signed-off-by: Elliot Barnwell <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Lonnie Liu <[email protected]>
β¦f format.sh (ray-project#57703) Update Ray documentation and pre-push hooks to standardize on pre-commit for linting and formatting. ## Summary - Updated `ci/lint/pre-push` hook to use `pre-commit run` instead of `ci/lint/format.sh` - Updated development documentation to reference `pre-commit` instead of `format.sh` for linting - Removed language suggesting pre-commit is "opt-in" or "planned for the future" since it's now the standard approach - Updated installation instructions to use `pre-commit install` ## Test plan - Verified documentation changes are accurate - Confirmed pre-commit configuration works correctly --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: angelinalg <[email protected]>
ray-project#57548) part 2 of ray-project#56149, a significant portion of the code is taken from the original PR. This PR does not introduce any change in functionality. Autoscaling is still performed at the deployment level. This will help us make the transition towards application level autoscaling. The only change in this PR 1. is moving the autoscaling control loop from the deployment state to the application state. 2. adding application autoscaling state class, in the new design autoscaling state manager will manage a list of application autoscaling states and each application autoscaling state will manage a list of deployment autoscaling states Signed-off-by: abrar <[email protected]>
β¦n actor (ray-project#57688) ## Summary This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism. ## Changes - Set `_generator_backpressure_num_objects` to -1 for the partition actor - Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic) ## Test plan - [ ] Verify download operations complete successfully - [ ] Confirm improved parallelism in download tasks - [ ] Check that backpressure is properly disabled for partition actor π€ Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: abrar <[email protected]>
ray-project#56474) Signed-off-by: zac <[email protected]> Signed-off-by: Zac Policzer <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
Currently, node events support only two states: ALIVE and DEAD. This PR introduces a new substate of ALIVE, called ALIVE_DRAINING. While this state may be triggered repeatedly, the consumer (dashboard) only needs to observe it once. To prevent overwhelming the event system, we add a flag to ensure the ALIVE_DRAINING event is emitted only once. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Adds DRAINING and IDLE_OR_ACTIVE node lifecycle states, emits DRAINING only once, and updates proto, event mapping, GCS manager, and tests accordingly. > > - **Proto**: > - Update `events_node_lifecycle_event.proto`: replace `ALIVE` with `IDLE_OR_ACTIVE` and add `DRAINING` state. > - **Observability**: > - `RayNodeLifecycleEvent`: when `GcsNodeInfo` is ALIVE, emit `DRAINING` if `state_snapshot` is `DRAINING`, else emit `IDLE_OR_ACTIVE`. > - **GCS Node Manager**: > - Track `draining_node_ids_` to ensure `DRAINING` export event is written once; clear on node removal. > - `UpdateAliveNode(...)`: set snapshot to `DRAINING` when draining and write a single export event for the transition. > - **Tests**: > - Adjust expectations from `ALIVE` to `IDLE_OR_ACTIVE`. > - Add assertion that only one `DRAINING` lifecycle event is exported for repeated draining updates. > - Update dashboard aggregator test to expect `IDLE_OR_ACTIVE`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit a5f1e37. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? In ray-project#57035 we deprecate `concurrency` params and use `compute` instead in `map`, `map_batches`, `flat_map` and `filter` so the related docs should be changed to use it as well so user won't use deprecated params. <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number Follow up for in ray-project#57035 <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
β¦57730) Example flake: https://buildkite.com/ray-project/postmerge/builds/13666#0199de53-b97e-4dea-9c1d-37ef56433b7c/607-1103 The way the test was written was inherently flaky because the first GC could happen at any time, so the timeouts that attempt to measure the time between the interval were inaccurate. It somewhat pains me to not make the test fully deterministic, but in an attempt to deflake without spending too much time here, I've improved it to at least wait until the first GC interval before starting the clock. I've also split the driver & actor conditions because the timers/intervals can be out of sync between them. There's also some weirdness here that we have two configs to control the GC interval, one for C++ and one for Python, but I'm letting that sleeping dog lie... --------- Signed-off-by: Edward Oakes <[email protected]>
Add Azure CLI and dependencies into `base-extra` images --------- Signed-off-by: kevin <[email protected]>
β¦8088) those tests have been failing and jailed for quite some time related to: - ray-project#46687 - ray-project#49847 - ray-project#49846 Signed-off-by: Lonnie Liu <[email protected]>
removing format script and all references --------- Signed-off-by: elliot-barn <[email protected]>
β¦ine backend (ray-project#57194) Signed-off-by: DPatel_7 <[email protected]> Co-authored-by: DPatel_7 <[email protected]>
β¦ submission/block generation metrics (ray-project#57246) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? On executor shutdown, the metrics persist even after execution. The plan is to reset on streaming_executor.shutdown. This PR also includes 2 potential drive-by fixes for metric calculation <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: iamjustinhsu <[email protected]>
Add some output example of the command to help the end-user to verify the execution result. Signed-off-by: fscnick <[email protected]>
## Description This PR adds a β`preserve_row` option to β`map_batches`. When β`preserve_row` is true, the limit operator can be pushed down through this β`map_batches` call for optimization. Note: β`map_group` is built on β`map_batches`, but limit pushdown support for β`map_group` is out of scope for this PR, so β`preserve_row_count` is set to false for it. ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: You-Cheng Lin <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ~~Before:~~ ~~https://github.com/user-attachments/assets/9db00f37-0c37-4e99-874a-a14481878e4a~~ ~~In before, the progress bar won't update until the first tasks finishes.~~ ~~After: ~~https://github.com/user-attachments/assets/99877a3f-7b52-4293-aae5-7702edfaabec~~ ~~In After, the progress bar won't update until the first task generates output. If a task generates 10 blocks, we will update the progress bar while it's generating blocks, even if the task hasn't finished. Once the task finishes, we default back to the way it was before.~~ ~~This is better because the very 1st progress bar update will occur sooner, and won't feel abrupt to the user.~~ Refractoring the progress bar estimates using known metrics. ## Why are these changes needed? Currently we use number of finished tasks. This is OK, but since we use streaming geneator, 1 task = thousands of blocks. This is troublesome for additional split factor (split blocks) in read parquet <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: iamjustinhsu <[email protected]>
β¦58046) This pr sets up the helper classes and utils to enable token based authentication for ray core rpc calls. --------- Signed-off-by: sampan <[email protected]> Co-authored-by: sampan <[email protected]>
I suspect that when we deploy the app config, we dont wait long enough before sending traffic, so requests could go to the wrong version --------- Signed-off-by: abrar <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: ahao-anyscale <[email protected]>
β¦58092) Signed-off-by: Jiajun Yao <[email protected]>
β¦ay-project#57882) # Summary The crux of the issue is that in the past, train run status was synonymous with final worker group status, but now, when there are pending validations, the worker group is finished but the train run is not. This leads to confusing situations in which the Train Run is `FINISHED`, but because there are pending validations, the `controller` actor is alive and results are inaccessible. This PR: * Adds a new `SHUTTING_DOWN` `TrainControllerState` that happens after the worker group finishes but before the controller shuts everything down. * Makes `ValidationManager` logging slightly cleaner. Like `RESCHEDULING`, `SHUTTING_DOWN` is a hidden state that shows up in `StateManager` logs and Grafana but not in the state export. We only want to show terminal states in the state export after `fit()` has returned and results are accessible. More concretely: * Finished/errored: The worker group finishes (Train Run is `RUNNING` but internal state is `SHUTTING_DOWN`), validation finishes (both Train Run and internal state say `FINISHED` or `ERRORED`), then results are accessible. * Aborted: Ideally, the worker group should be aborted and in-flight validation tasks canceled before the Train Run is `ABORTED`. However, this PR doesn't change the current behavior, in which the Train Run might be `ABORTED` before reference counting cleans up the validation tasks. I will cancel validation tasks before marking the train run `ABORTED` in a future PR. I considered polling both the worker group and validations in `_step` itself, but decided to leave `_step` as a function that only cares about the worker group. # Testing Unit tests --------- Signed-off-by: Timothy Seah <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
β¦ject#57930) Add actor+job+node event to ray event export doc Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Dhyey Shah <[email protected]> Signed-off-by: Qiaolin-Yu <[email protected]> Signed-off-by: Qiaolin Yu <[email protected]> Co-authored-by: Dhyey Shah <[email protected]> Co-authored-by: Stephanie Wang <[email protected]>
disabled the wrong test with a different name from the issue mistakenly associated issue: ray-project#46687 Signed-off-by: Lonnie Liu <[email protected]>
upgrading batch inference tests to py3.10 Successful release test run: https://buildkite.com/ray-project/release/builds/65258 all except for image_embedding_from_jsonl are running on python 3.10 --------- Signed-off-by: elliot-barn <[email protected]>
β¦ay-project#57636) Signed-off-by: Mengjin Yan <[email protected]> Co-authored-by: Jiajun Yao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pull request #661 has too many files changed.
The GitHub API will only let us fetch up to 300 changed files, and this pull request has 4926.
Summary of ChangesHello @antfin-oss, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request represents a substantial overhaul of the project's build and continuous integration infrastructure. The primary goal is to enhance build determinism, efficiency, and maintainability across various components, including Python, C++, and Java. Key changes involve adopting a new dependency management tool ( Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with π and π on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is an automated daily merge from master to main, containing a large number of changes, primarily focused on a significant refactoring and improvement of the build and CI system. The changes are extensive and well-executed, leading to a more modular, maintainable, and robust system.
Key improvements include:
- Bazel Refactoring: The root
BUILD.bazelfile has been cleaned up, with targets moved to more appropriate sub-packages. The use ofrules_pkgfor packaging is a welcome modernization. - CI/CD Overhaul: The Buildkite pipelines have been heavily refactored. Builds are now more modular (e.g.,
core,dashboard,javaare built as separate artifacts). Dependency management is enhanced with the introduction of theraydepsetstool. - Dependency Management: The project has migrated from
minicondatominiforge, and theuvpackage manager has been introduced for faster dependency resolution. Several package versions have been updated. - Linting and Style: The
pre-commitconfiguration has been significantly improved with the addition of tools likesemgrepandvale, and better configuration for existing tools. TheCODEOWNERSfile and PR template have also been updated. - Code Modernization: Several C++ components have been updated to use modern C++ features (e.g.,
std::invoke_result_t), and code style has been improved. Python code has been updated to use newer APIs where applicable.
Overall, these changes represent a substantial step forward for the project's infrastructure. The refactoring is logical and the improvements are clear. I did not find any issues that require attention.
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
This Pull Request was created automatically to merge the latest changes from
masterintomainbranch.π Created: 2025-10-27
π Merge direction:
masterβmainπ€ Triggered by: Scheduled
Please review and merge if everything looks good.