🔄 daily merge: master → main 2025-10-27 #661

antfin-oss · 2025-10-27T03:06:23Z

This Pull Request was created automatically to merge the latest changes from master into main branch.

📅 Created: 2025-10-27
🔀 Merge direction: master → main
🤖 Triggered by: Scheduled

Please review and merge if everything looks good.

…roject#57675) out of `util.py`. also adding its own `py_library` Signed-off-by: Lonnie Liu <[email protected]>

@codope

## Why are these changes needed? I've encountered an issue where Ray sends SIGKILL to child processes (grandchild will not receive the signal) launched by a Ray actor. As a result, the subprocess cannot catch the signal to gracefully clean up its child processes. Therefore, the grandchild processes of the actor will leak. I'm glad to see ray-project#56476 by @codope, and I also built a similar solution myself. This PR adds the case where I met. @codope why not enable this feature by default? ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kai-Hsun Chen <[email protected]>

into `anyscale_job_runner`. it is only used in `anyscale_job_runner` now Signed-off-by: Lonnie Liu <[email protected]>

…ject#57681) it is only used in glue.py Signed-off-by: Lonnie Liu <[email protected]>

…project#57682) should all be using hermetic python with python 3.8 or above now Signed-off-by: Lonnie Liu <[email protected]>

make that `test_in_docker` does not depend on the entire `ray_release` library, but only depends on python files that are required for the test db to work. this removes the dependency of `cryptography` library from `ray_ci`, so that windows wheels can be built and windows tests can run again. Signed-off-by: Lonnie Liu <[email protected]>

…ystem reserved resources (ray-project#57653) Signed-off-by: irabbani <[email protected]> Signed-off-by: israbbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

Cleaning out plasma and the plasma client and its neighbors. The plasma client had a pimpl implementation, even though we didn't really need anything that would come with pimpl out of plasma. So just killing the separate impl class and just having the plasma client and its interface. One note about this is that it needs `shared_from_this` and the old plasma client would always contain a shared ptr to the impl, so had to refactor the raylet to use a shared ptr to the plasma client so we could keep using the `shared_from_this`. Other cleanup: - a lot of the ipc functions always returned status::ok so changed to void - some extra reserving of vectors and moving - unnecessary consts in pull manager that would prevent moves etc. --------- Signed-off-by: dayshah <[email protected]>

@dayshah

ray-project#57626) The test times out frequently in CI. Before this change, the test took `~40s` to run on my laptop. After the change, the test took `~15s` to run on my laptop. There also seems to be hanging related to in-order execution semantics, so for now flipping to `allow_out_of_order_exection=True`. @dayshah will add the `False` variant when he fixes the underlying issue. --------- Signed-off-by: Edward Oakes <[email protected]>

…oject#56853) This PR refactors the `TaskExecutionEvent` proto in two ways: - Rename the file to `events_task_lifecycle_event.proto` - Refactor the task_state from a map to an array of TaskState and timestamp. Also rename the field to `state_transitions` for consistency. This PR depends on the upstream to update their logic to consume this new schema. Test: - CI  --- > [!NOTE] > Renames task execution event to task lifecycle event and changes its schema from a state map to an ordered state_transitions list, updating core, GCS, dashboard, builds, tests, and docs. > > - **Proto/API changes (breaking)** > - Rename `TaskExecutionEvent` → `TaskLifecycleEvent` and update `RayEvent.EventType` (`TASK_EXECUTION_EVENT` → `TASK_LIFECYCLE_EVENT`). > - Replace `task_state` map with `state_transitions` (list of `{state, timestamp}`) in `events_task_lifecycle_event.proto`. > - Update `events_base_event.proto` field from `task_execution_event` → `task_lifecycle_event` and imports/BUILD deps accordingly. > - **Core worker** > - Update buffer/conversion logic in `src/ray/core_worker/task_event_buffer.{h,cc}` to populate/emit `TaskLifecycleEvent` with `state_transitions`. > - **GCS** > - Update `GcsRayEventConverter` to consume `TASK_LIFECYCLE_EVENT` and convert `state_transitions` to `state_ts_ns`. > - **Dashboard/Aggregator** > - Switch exposable type defaults/env to `TASK_LIFECYCLE_EVENT` in `python/.../aggregator_agent.py`. > - **Tests** > - Adjust unit tests to new event/type and schema across core worker, GCS, and dashboard. > - **Docs** > - Update event export guide references to new lifecycle event proto. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 61507e8. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  Signed-off-by: Cuong Nguyen <[email protected]>

- removing enum capability enum as it is not being used, for more details: ray-project#56707 (comment) --------- Signed-off-by: harshit <[email protected]>

… default (ray-project#57623) Previously we were using `DeprecationWarning` which is silenced by default. Now this is printed: ``` >>> ray.init(local_mode=True) /Users/eoakes/code/ray/python/ray/_private/client_mode_hook.py:104: FutureWarning: `local_mode` is an experimental feature that is no longer maintained and will be removed in the near future. For debugging consider using the Ray distributed debugger. return func(*args, **kwargs) ``` --------- Signed-off-by: Edward Oakes <[email protected]>

- adding a new note about using filesystem as a broker in celery --------- Signed-off-by: harshit <[email protected]>

## Description  Improved the Ray pull request template to make it less overwhelming for contributors while giving maintainers better information for reviews and release notes. The new template has clearer sections and organized checklists that are much easier to fill out. This should encourage more contributions while making the review process smoother and release note generation more straightforward. ## Related issues  ## Types of change - [ ] Bug fix 🐛 - [ ] New feature ✨ - [x] Enhancement 🚀 - [ ] Code refactoring 🔧 - [ ] Documentation update 📖 - [ ] Chore 🧹 - [ ] Style 🎨 ## Checklist **Does this PR introduce breaking changes?** - [ ] Yes ⚠️ - [x] No  **Testing:** - [ ] Added/updated tests for my changes - [x] Tested the changes manually - [ ] This PR is not tested ❌ _(please explain why)_ **Code Quality:** - [x] Signed off every commit (`git commit -s`) - [x] Ran pre-commit hooks ([setup guide](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) **Documentation:** - [ ] Updated documentation (if applicable) ([contribution guide](https://docs.ray.io/en/latest/ray-contribute/docs.html)) - [ ] Added new APIs to `doc/source/` (if applicable) ## Additional context  --------- Signed-off-by: Matthew Deng <[email protected]> Signed-off-by: matthewdeng <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <[email protected]> Signed-off-by: Jiang Wu <[email protected]> Co-authored-by: Nikhil G <[email protected]>

…ject#57702) This PR refactors the operator metrics logging tests in `test_stats.py` to improve clarity, reliability, and maintainability. - Replaced `test_op_metrics_logging` and `test_op_state_logging` with a single, more focused test: `test_executor_logs_metrics_on_operator_completion` - Uses pytest's `caplog` fixture instead of mocking the logger (more idiomatic) - Tests the core behavior (operator completion metrics logged exactly once) without depending on exact log message formatting - Eliminates reliance on helper functions and complex string matching - More descriptive test name following unit testing best practices - Reduced test code complexity while maintaining coverage of critical logging behavior Signed-off-by: Balaji Veeramani <[email protected]>

## Why are these changes needed? as titled  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: iamjustinhsu <[email protected]>

## Why are these changes needed? This PR speeds up the Data CI pipeline by increasing parallelism and improving test distribution: 1. **Increased parallelism for parallel tests**: Bumped from 2 to 8 workers for both `data9test` and `dataltest` jobs that handle tests tagged with `data` (but not `data_non_parallel`) 2. **Added parallelism for non-parallel tests**: Added 3-way parallelism to `data9test_non_parallel` and `dataltest_non_parallel` jobs with proper worker distribution flags (`--workers` and `--worker-id`) These changes should significantly reduce CI runtime for Data tests by better utilizing available resources. ## Related issue number N/A ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [x] This PR is not tested :( --------- Signed-off-by: Balaji Veeramani <[email protected]>

temporarily soft failing on llm dependency compilation Signed-off-by: elliot-barn <[email protected]>

…c chaos… (ray-project#57288) Signed-off-by: joshlee <[email protected]>

Including config_name in depsets Remove build_arg_sets from config class --------- Signed-off-by: elliot-barn <[email protected]> Signed-off-by: Elliot Barnwell <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Lonnie Liu <[email protected]>

…f format.sh (ray-project#57703) Update Ray documentation and pre-push hooks to standardize on pre-commit for linting and formatting. ## Summary - Updated `ci/lint/pre-push` hook to use `pre-commit run` instead of `ci/lint/format.sh` - Updated development documentation to reference `pre-commit` instead of `format.sh` for linting - Removed language suggesting pre-commit is "opt-in" or "planned for the future" since it's now the standard approach - Updated installation instructions to use `pre-commit install` ## Test plan - Verified documentation changes are accurate - Confirmed pre-commit configuration works correctly --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: angelinalg <[email protected]>

ray-project#57548) part 2 of ray-project#56149, a significant portion of the code is taken from the original PR. This PR does not introduce any change in functionality. Autoscaling is still performed at the deployment level. This will help us make the transition towards application level autoscaling. The only change in this PR 1. is moving the autoscaling control loop from the deployment state to the application state. 2. adding application autoscaling state class, in the new design autoscaling state manager will manage a list of application autoscaling states and each application autoscaling state will manage a list of deployment autoscaling states Signed-off-by: abrar <[email protected]>

…n actor (ray-project#57688) ## Summary This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism. ## Changes - Set `_generator_backpressure_num_objects` to -1 for the partition actor - Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic) ## Test plan - [ ] Verify download operations complete successfully - [ ] Confirm improved parallelism in download tasks - [ ] Check that backpressure is properly disabled for partition actor 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: abrar <[email protected]>

ray-project#56474) Signed-off-by: zac <[email protected]> Signed-off-by: Zac Policzer <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

Currently, node events support only two states: ALIVE and DEAD. This PR introduces a new substate of ALIVE, called ALIVE_DRAINING. While this state may be triggered repeatedly, the consumer (dashboard) only needs to observe it once. To prevent overwhelming the event system, we add a flag to ensure the ALIVE_DRAINING event is emitted only once. Test: - CI  --- > [!NOTE] > Adds DRAINING and IDLE_OR_ACTIVE node lifecycle states, emits DRAINING only once, and updates proto, event mapping, GCS manager, and tests accordingly. > > - **Proto**: > - Update `events_node_lifecycle_event.proto`: replace `ALIVE` with `IDLE_OR_ACTIVE` and add `DRAINING` state. > - **Observability**: > - `RayNodeLifecycleEvent`: when `GcsNodeInfo` is ALIVE, emit `DRAINING` if `state_snapshot` is `DRAINING`, else emit `IDLE_OR_ACTIVE`. > - **GCS Node Manager**: > - Track `draining_node_ids_` to ensure `DRAINING` export event is written once; clear on node removal. > - `UpdateAliveNode(...)`: set snapshot to `DRAINING` when draining and write a single export event for the transition. > - **Tests**: > - Adjust expectations from `ALIVE` to `IDLE_OR_ACTIVE`. > - Add assertion that only one `DRAINING` lifecycle event is exported for repeated draining updates. > - Update dashboard aggregator test to expect `IDLE_OR_ACTIVE`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit a5f1e37. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --------- Signed-off-by: Cuong Nguyen <[email protected]>

## Why are these changes needed? In ray-project#57035 we deprecate `concurrency` params and use `compute` instead in `map`, `map_batches`, `flat_map` and `filter` so the related docs should be changed to use it as well so user won't use deprecated params.  ## Related issue number Follow up for in ray-project#57035  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: You-Cheng Lin (Owen) <[email protected]>

…57730) Example flake: https://buildkite.com/ray-project/postmerge/builds/13666#0199de53-b97e-4dea-9c1d-37ef56433b7c/607-1103 The way the test was written was inherently flaky because the first GC could happen at any time, so the timeouts that attempt to measure the time between the interval were inaccurate. It somewhat pains me to not make the test fully deterministic, but in an attempt to deflake without spending too much time here, I've improved it to at least wait until the first GC interval before starting the clock. I've also split the driver & actor conditions because the timers/intervals can be out of sync between them. There's also some weirdness here that we have two configs to control the GC interval, one for C++ and one for Python, but I'm letting that sleeping dog lie... --------- Signed-off-by: Edward Oakes <[email protected]>

Add Azure CLI and dependencies into `base-extra` images --------- Signed-off-by: kevin <[email protected]>

…8088) those tests have been failing and jailed for quite some time related to: - ray-project#46687 - ray-project#49847 - ray-project#49846 Signed-off-by: Lonnie Liu <[email protected]>

removing format script and all references --------- Signed-off-by: elliot-barn <[email protected]>

…ine backend (ray-project#57194) Signed-off-by: DPatel_7 <[email protected]> Co-authored-by: DPatel_7 <[email protected]>

… submission/block generation metrics (ray-project#57246)   ## Why are these changes needed? On executor shutdown, the metrics persist even after execution. The plan is to reset on streaming_executor.shutdown. This PR also includes 2 potential drive-by fixes for metric calculation  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: iamjustinhsu <[email protected]>

Add some output example of the command to help the end-user to verify the execution result. Signed-off-by: fscnick <[email protected]>

## Description This PR adds a ‎`preserve_row` option to ‎`map_batches`. When ‎`preserve_row` is true, the limit operator can be pushed down through this ‎`map_batches` call for optimization. Note: ‎`map_group` is built on ‎`map_batches`, but limit pushdown support for ‎`map_group` is out of scope for this PR, so ‎`preserve_row_count` is set to false for it. ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: You-Cheng Lin <[email protected]>

~~Before:~~ ~~https://github.com/user-attachments/assets/9db00f37-0c37-4e99-874a-a14481878e4a~~ ~~In before, the progress bar won't update until the first tasks finishes.~~ ~~After: ~~https://github.com/user-attachments/assets/99877a3f-7b52-4293-aae5-7702edfaabec~~ ~~In After, the progress bar won't update until the first task generates output. If a task generates 10 blocks, we will update the progress bar while it's generating blocks, even if the task hasn't finished. Once the task finishes, we default back to the way it was before.~~ ~~This is better because the very 1st progress bar update will occur sooner, and won't feel abrupt to the user.~~ Refractoring the progress bar estimates using known metrics. ## Why are these changes needed? Currently we use number of finished tasks. This is OK, but since we use streaming geneator, 1 task = thousands of blocks. This is troublesome for additional split factor (split blocks) in read parquet  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: iamjustinhsu <[email protected]>

…58046) This pr sets up the helper classes and utils to enable token based authentication for ray core rpc calls. --------- Signed-off-by: sampan <[email protected]> Co-authored-by: sampan <[email protected]>

I suspect that when we deploy the app config, we dont wait long enough before sending traffic, so requests could go to the wrong version --------- Signed-off-by: abrar <[email protected]>

Signed-off-by: Seiji Eicher <[email protected]>

Signed-off-by: ahao-anyscale <[email protected]>

…58092) Signed-off-by: Jiajun Yao <[email protected]>

…ay-project#57882) # Summary The crux of the issue is that in the past, train run status was synonymous with final worker group status, but now, when there are pending validations, the worker group is finished but the train run is not. This leads to confusing situations in which the Train Run is `FINISHED`, but because there are pending validations, the `controller` actor is alive and results are inaccessible. This PR: * Adds a new `SHUTTING_DOWN` `TrainControllerState` that happens after the worker group finishes but before the controller shuts everything down. * Makes `ValidationManager` logging slightly cleaner. Like `RESCHEDULING`, `SHUTTING_DOWN` is a hidden state that shows up in `StateManager` logs and Grafana but not in the state export. We only want to show terminal states in the state export after `fit()` has returned and results are accessible. More concretely: * Finished/errored: The worker group finishes (Train Run is `RUNNING` but internal state is `SHUTTING_DOWN`), validation finishes (both Train Run and internal state say `FINISHED` or `ERRORED`), then results are accessible. * Aborted: Ideally, the worker group should be aborted and in-flight validation tasks canceled before the Train Run is `ABORTED`. However, this PR doesn't change the current behavior, in which the Train Run might be `ABORTED` before reference counting cleans up the validation tasks. I will cancel validation tasks before marking the train run `ABORTED` in a future PR. I considered polling both the worker group and validations in `_step` itself, but decided to leave `_step` as a function that only cares about the worker group. # Testing Unit tests --------- Signed-off-by: Timothy Seah <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…ject#57930) Add actor+job+node event to ray event export doc Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>

Signed-off-by: Dhyey Shah <[email protected]> Signed-off-by: Qiaolin-Yu <[email protected]> Signed-off-by: Qiaolin Yu <[email protected]> Co-authored-by: Dhyey Shah <[email protected]> Co-authored-by: Stephanie Wang <[email protected]>

disabled the wrong test with a different name from the issue mistakenly associated issue: ray-project#46687 Signed-off-by: Lonnie Liu <[email protected]>

upgrading batch inference tests to py3.10 Successful release test run: https://buildkite.com/ray-project/release/builds/65258 all except for image_embedding_from_jsonl are running on python 3.10 --------- Signed-off-by: elliot-barn <[email protected]>

…t#58160) Reverts ray-project#58036

…ay-project#57636) Signed-off-by: Mengjin Yan <[email protected]> Co-authored-by: Jiajun Yao <[email protected]>

sourcery-ai

The pull request #661 has too many files changed.

The GitHub API will only let us fetch up to 300 changed files, and this pull request has 4926.

gemini-code-assist · 2025-10-27T03:13:42Z

Summary of Changes

Hello @antfin-oss, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents a substantial overhaul of the project's build and continuous integration infrastructure. The primary goal is to enhance build determinism, efficiency, and maintainability across various components, including Python, C++, and Java. Key changes involve adopting a new dependency management tool (raydepsets with uv), reorganizing Buildkite pipelines for improved modularity, and updating core Bazel configurations and Docker image creation processes. These efforts are crucial for ensuring a robust and scalable development workflow.

Highlights

Bazel Configuration Updates: Several changes to .bazelrc were implemented, including enabling --incompatible_strict_action_env by default, adding a workspace status command for Linux, specifying UTF-8 encoding for Windows C++ compiler options, and suppressing warnings for third-party code. Some previous configurations related to strict action environments and specific Linux compiler flags were removed.
CI Pipeline Refactoring: The Buildkite CI YAML files underwent significant restructuring. Image definitions were moved to a new _images.rayci.yml file for better organization. New build steps were introduced for core, dashboard, and Java components, and dependencies for various test stages were updated to reflect these changes and new CUDA versions.
Dependency Management Modernization: The project adopted uv for Python dependency management and introduced a new raydepsets tool. This replaces older pip-compile methods and manual requirement handling, aiming to streamline the compilation and management of Python dependency sets for improved consistency and efficiency.
MacOS Build and Test Migration: MacOS builds and tests were migrated to macos-arm64 instances, with the removal of x86_64 MacOS wheel builds. Associated MacOS build scripts were consolidated to support this new architecture.
Linting and Code Ownership Enhancements: The .pre-commit-config.yaml was updated to integrate new linters like semgrep and vale. Additionally, the .github/CODEOWNERS file was revised to reflect broader team ownership and include new file paths, ensuring better code quality and accountability.
Python Protobuf Generation Refinement: The process for generating Python protobuf files was refactored. It now utilizes pkg_zip and genrule for more efficient packaging and improved dependency handling within the Bazel build system.
C++ API and Runtime Adjustments: Updates were made to C++ API headers and runtime implementations. This includes changes to RemoteFunctionHolder and how metrics are recorded, along with the removal of some ray_common dependencies, contributing to a cleaner and potentially more performant C++ codebase.

Ignored Files

Ignored by pattern: .gemini/** (1)
- .gemini/config.yaml
Ignored by pattern: .github/workflows/** (1)
- .github/workflows/stale_pull_request.yaml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is an automated daily merge from master to main, containing a large number of changes, primarily focused on a significant refactoring and improvement of the build and CI system. The changes are extensive and well-executed, leading to a more modular, maintainable, and robust system.

Key improvements include:

Bazel Refactoring: The root BUILD.bazel file has been cleaned up, with targets moved to more appropriate sub-packages. The use of rules_pkg for packaging is a welcome modernization.
CI/CD Overhaul: The Buildkite pipelines have been heavily refactored. Builds are now more modular (e.g., core, dashboard, java are built as separate artifacts). Dependency management is enhanced with the introduction of the raydepsets tool.
Dependency Management: The project has migrated from miniconda to miniforge, and the uv package manager has been introduced for faster dependency resolution. Several package versions have been updated.
Linting and Style: The pre-commit configuration has been significantly improved with the addition of tools like semgrep and vale, and better configuration for existing tools. The CODEOWNERS file and PR template have also been updated.
Code Modernization: Several C++ components have been updated to use modern C++ features (e.g., std::invoke_result_t), and code style has been improved. Python code has been updated to use newer APIs where applicable.

Overall, these changes represent a substantial step forward for the project's infrastructure. The refactoring is logical and the improvements are clear. I did not find any issues that require attention.

github-actions · 2025-11-11T01:43:52Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

aslonnie and others added 30 commits October 13, 2025 11:40

[release test] move azure related functions to cloud_util.py (ray-p…

5e8da1a

…roject#57675) out of `util.py`. also adding its own `py_library` Signed-off-by: Lonnie Liu <[email protected]>

[release test] move in join_cloud_storage_paths (ray-project#57679)

6b3ebc1

into `anyscale_job_runner`. it is only used in `anyscale_job_runner` now Signed-off-by: Lonnie Liu <[email protected]>

[release test] move upload_working_dir_to_gcs to glue.py (ray-pro…

481efca

…ject#57681) it is only used in glue.py Signed-off-by: Lonnie Liu <[email protected]>

[release test] remove the use of typing extension for TypedDict (ray-…

5a5ad71

…project#57682) should all be using hermetic python with python 3.8 or above now Signed-off-by: Lonnie Liu <[email protected]>

remove async capability enum (ray-project#57666)

f5d9b07

- removing enum capability enum as it is not being used, for more details: ray-project#56707 (comment) --------- Signed-off-by: harshit <[email protected]>

add note for celery workers (ray-project#57686)

bb385a4

- adding a new note about using filesystem as a broker in celery --------- Signed-off-by: harshit <[email protected]>

[Data][LLM] Fixing runai_streamer for vLLM 0.10.2 integration (and De…

a18653a

…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <[email protected]> Signed-off-by: Jiang Wu <[email protected]> Co-authored-by: Nikhil G <[email protected]>

[deps] soft failing llm deps (ray-project#57708)

e82137d

temporarily soft failing on llm dependency compilation Signed-off-by: elliot-barn <[email protected]>

[core] Guarantee min amount of failures for request/response using rp…

d4e4956

…c chaos… (ray-project#57288) Signed-off-by: joshlee <[email protected]>

add ray io docs for replica ranks (ray-project#57649)

4e3039e

Signed-off-by: abrar <[email protected]>

Remove node observability information from hot path of core components (

830a456

ray-project#56474) Signed-off-by: zac <[email protected]> Signed-off-by: Zac Policzer <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

update serve autoscaling docs (ray-project#57652)

4269a18

khluu and others added 20 commits October 24, 2025 09:52

[release] Add Azure CLI to base-extra image (ray-project#58012)

af33918

Add Azure CLI and dependencies into `base-extra` images --------- Signed-off-by: kevin <[email protected]>

[air] disable air example tests that have been failing (ray-project#5…

7255df0

…8088) those tests have been failing and jailed for quite some time related to: - ray-project#46687 - ray-project#49847 - ray-project#49846 Signed-off-by: Lonnie Liu <[email protected]>

[lint][ci] removing format script (ray-project#57799)

353bdcf

removing format script and all references --------- Signed-off-by: elliot-barn <[email protected]>

[serve][llm][transcription] Add support for Transcription in vLLM eng…

ca1f7d9

…ine backend (ray-project#57194) Signed-off-by: DPatel_7 <[email protected]> Co-authored-by: DPatel_7 <[email protected]>

[Doc][KubeRay] add output example of the command (ray-project#58078)

14d2689

Add some output example of the command to help the end-user to verify the execution result. Signed-off-by: fscnick <[email protected]>

[Core] Add authentication token logic and related tests (ray-project#…

b197fa8

…58046) This pr sets up the helper classes and utils to enable token based authentication for ray core rpc calls. --------- Signed-off-by: sampan <[email protected]> Co-authored-by: sampan <[email protected]>

add deployment status check in test (ray-project#58087)

226a414

I suspect that when we deploy the app config, we dont wait long enough before sending traffic, so requests could go to the wrong version --------- Signed-off-by: abrar <[email protected]>

[llm][data] Change example pip install to ray[llm] (ray-project#58096)

05c0aff

Signed-off-by: Seiji Eicher <[email protected]>

[doc][serve][llm] Model loading Docs (ray-project#57922)

a1cf87c

Signed-off-by: ahao-anyscale <[email protected]>

[Core] Fix RAY_CHECK(inserted) inside reference counter (ray-project#…

73deda6

…58092) Signed-off-by: Jiajun Yao <[email protected]>

[core][doc] add actor+job+node event to ray event export doc (ray-pro…

779586f

…ject#57930) Add actor+job+node event to ray event export doc Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>

[air] enable a test that was mistakenly disabled (ray-project#58089)

6efcd02

disabled the wrong test with a different name from the issue mistakenly associated issue: ray-project#46687 Signed-off-by: Lonnie Liu <[email protected]>

Revert "[ci] enabling python 3.10 for subset of ci tests" (ray-projec…

4c82e82

…t#58160) Reverts ray-project#58036

[Core][TaskEvents] Add Integration Tests for Task Event Generation (r…

9cbe131

…ay-project#57636) Signed-off-by: Mengjin Yan <[email protected]> Co-authored-by: Jiajun Yao <[email protected]>

antfin-oss requested review from SongGuyang and kfstorm as code owners October 27, 2025 03:06

antfin-oss added auto-generated daily-merge labels Oct 27, 2025

antfin-oss assigned ffbin Oct 27, 2025

sourcery-ai bot reviewed Oct 27, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

github-actions bot added the stale label Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🔄 daily merge: master → main 2025-10-27 #661

🔄 daily merge: master → main 2025-10-27 #661

Uh oh!

antfin-oss commented Oct 27, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

71 participants

🔄 daily merge: master → main 2025-10-27 #661

Are you sure you want to change the base?

🔄 daily merge: master → main 2025-10-27 #661

Uh oh!

Conversation

antfin-oss commented Oct 27, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

71 participants