🔄 daily merge: master → main 2025-11-03 #668

antfin-oss · 2025-11-03T03:01:36Z

This Pull Request was created automatically to merge the latest changes from master into main branch.

📅 Created: 2025-11-03
🔀 Merge direction: master → main
🤖 Triggered by: Scheduled

Please review and merge if everything looks good.

…Autoscaling config (ray-project#57260) In KubeRay v1.5, new structured `Resources` and `Labels` fields will be added to both the `HeadGroupSpec` and `WorkerGroupSpec` to enable users to explicitly define these values without relying on `rayStartParams`. This support is implemented in ray-project/kuberay#4106. In order for this change to work with the Ray autoscaler, this PR adds support for checking the `resources` and `labels` field (with precedence over the values `rayStartParams` and k8s constainer spec) and generating the resulting autoscaling config. This change is compatible with older versions of KubeRay that do not have these new fields, since we fall back to the previous logic when the top-level fields aren't specified. Example: For a group spec specified like this: ``` workerGroupSpecs: - groupName: worker-group-1 replicas: 1 resources: GPU: 8 CPU: 16 labels: ray.io/zone: us-west2-a ray.io/region: us-west2 ``` The resulting autoscaling config for worker-group-1 would be: ``` "worker-group-1": { "labels": {" ray.io/zone": "us-west2-a", " ray.io/region": "us-west2"}, ... "resources": { "CPU": 16, GPU: 8, }, }, ``` ## Related issue number Contributes to ray-project#51564 --------- Signed-off-by: Ryan O'Leary <[email protected]>

…ing cluster teardown (ray-project#57610) After running `ray down` several resources (a managed service identiy, network security group, etc) at left in the subscription and re-running `ray up` (without specifying --no-config-cache) will sometimes run into errors because of this. ## Related issue number Fixes: ray-project#55392 Signed-off-by: Mark Rossett <[email protected]>

The old expression evaluator did not correctly handle `is_in` which failed tests in `test_expression_evaluator` ## Related issues Fixes ray-project#57820 --------- Signed-off-by: Goutam <[email protected]>

Release tests, unit tests, doctests, and examples are all migrated to V2, so this PR turns V2 on by default. To run with Train V1 (deprecated), set `RAY_TRAIN_V2_ENABLED=0`. --------- Signed-off-by: Justin Yu <[email protected]>

…ct#57133) This PR adds a workspace template that walks users through how to integrate PyTorch Profiler with Ray Train. The purpose of this template is to walk user through how to generate trace/memory profile with Pytorch Profiler in the TorchTrainer. For a high level overview, this template covers: 1. A hands-on example of training an image classification model 2. A simple torch profiler integration script 3. Some more advanced use cases including `record_function` to customize the profiling experience. 4. a successful release test run: https://buildkite.com/ray-project/release/builds/63492#0199e3e4-fa8c-4f4f-a92c-f6d47a415c53 Testing: testing in Anyscale workspace --------- Signed-off-by: Lehui Liu <[email protected]>

Signed-off-by: abrar <[email protected]>

…uides and improved navigation (ray-project#57787) Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: kourosh hakhamaneshi <[email protected]> Co-authored-by: angelinalg <[email protected]>

…ss/postprocess (ray-project#57826) Signed-off-by: Nikhil Ghosh <[email protected]>

## Description This removes an orphaned code file that was previously used by the Preprocessor User Guide. ## Related issues Corresponding User Guide was removed in ray-project#44006. Closes ray-project#57867. ## Additional details This test started failing because of the new `XGBoostTrainer` API enabled by default with Ray Train V2. Rather than update the snippet, removing this code instead. Signed-off-by: Matthew Deng <[email protected]>

adding eslint and prettier script to precommit before getting rid of format.sh 1 step closer to replacing scripts/format.sh with pre-commit (pre-commit is currently missing eslint) tested locally: <img width="898" height="929" alt="image" src="https://github.com/user-attachments/assets/58c77fb7-bdde-47ae-ac2b-b864334b3f30" /> --------- Signed-off-by: elliot-barn <[email protected]>

First test running on AKS cloud! --------- Signed-off-by: kevin <[email protected]> Signed-off-by: Kevin H. Luu <[email protected]> Co-authored-by: Lonnie Liu <[email protected]>

## Description Updating so that the module shows as `ray.train` rather than `ray.train.v2.api.exceptions` ## Testing https://anyscale-ray--57865.com.readthedocs.build/en/57865/train/api/doc/ray.train.v2.api.data_parallel_trainer.DataParallelTrainer.fit.html#ray.train.v2.api.data_parallel_trainer.DataParallelTrainer.fit <img width="960" height="302" alt="image" src="https://github.com/user-attachments/assets/02206542-54fe-4674-b2b4-1868fa7e8580" /> Signed-off-by: Matthew Deng <[email protected]>

- Add 2 hello world tests with regular base image & custom image running on GCE --------- Signed-off-by: kevin <[email protected]> Signed-off-by: Kevin H. Luu <[email protected]>

## Description Bump from small to medium due to timeouts happening specifically in py3.12 tests. --------- Signed-off-by: Matthew Deng <[email protected]>

## Why are these changes needed? Computing the `num_module_steps_trained_(lifetime)_throughput` metrics are biased due to the way how we record throughput times in a loop over module batches. This PR offers a fix to this bias. ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: simonsays1980 <[email protected]> Co-authored-by: Kamil Kaczmarek <[email protected]>

…orker` (ray-project#57859) ## Description The type annotation for `actor_location_tracker` is currently `ActorLocationTracker`, but it should be `ray.actor.ActorHandle[ActorLocationTracker]`. This PR fixes that issue. Signed-off-by: Balaji Veeramani <[email protected]>

ray-project#57834) Signed-off-by: Jiajun Yao <[email protected]>

…r'. (ray-project#57673)   ## Why are these changes needed? The type hints for `learner_connector` in `AlgorithmConfig.training` was deprecated still using the `RLModule` as parameter. This PR adjust type hints to the actual expected form of the callable. ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: simonsays1980 <[email protected]>

`result_of_t` is deprecated Signed-off-by: Lonnie Liu <[email protected]>

…ectural Design (ray-project#57889) Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

- disables java tests; ray java not supported on apple silicon yet. - skipping cpp tests that are not passing yet we already stopped releasing macos wheels for Intel silicon, the tests that are disabled or skipped were never passing on apple silicon, so nothing is regressed. Signed-off-by: Lonnie Liu <[email protected]>

…ay-project#57876) ## Description ## Related issues Closes ray-project#57847 ## Additional information Signed-off-by: daiping8 <[email protected]>

…ystem cgroup (ray-project#57864) For more details about the resource isolation project see ray-project#54703. When starting the head node, move the dashboard api server's subprocesses into the system cgroup. I updated the integration test and added a helpful error message because the test will break in the future when a new dashboard module is added. I ran the integration tests 25 times locally. > (ray2) ubuntu@devbox:~/code/ray2$ python -m pytest -s python/ray/tests/resource_isolation/test_resource_isolation_integration.py --count 25 -x ... collecting ... python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 25% ██▌ 2025-10-17 23:13:51,897 INFO worker.py:1833 -- Connecting to existing Ray cluster at address: 172.31.12.251:6379... 2025-10-17 23:13:51,905 INFO worker.py:2004 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265 python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 26% ██▋ 2025-10-17 23:13:57,592 INFO worker.py:1833 -- Connecting to existing Ray cluster at address: 172.31.12.251:6379... 2025-10-17 23:13:57,598 INFO worker.py:2004 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265 python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 98% █████████▊2025-10-17 23:19:45,417 INFO worker.py:2004 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 99% █████████▉2025-10-17 23:19:50,194 INFO worker.py:2004 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 100% ██████████ Results (366.41s): 100 passed --------- Signed-off-by: irabbani <[email protected]>

…roject#57037) During the execution of tail_job_logs() after the job submission, if the ray head connection breaks, the tail_job_logs() will not raise any error. The error should be raised. Query the rayjob status when receiving the message, and raise error if connection closed with rayjob not in terminate stage. ## Related issue number Closes: ray-project#57002 --------- Signed-off-by: machichima <[email protected]>

…ect#57897)

…ay-project#57802) ## Description 1. This PR added the `jax.distributed.shutdown()` for JaxBackend in order to free up any leaked resources on TPU RayTrainWorkers. 2. if `jax.distributed` is not on, it is a noop: https://docs.jax.dev/en/latest/_autosummary/jax.distributed.shutdown.html 3. Tested on Anyscale workspace. <img width="1264" height="62" alt="image" src="https://github.com/user-attachments/assets/f28102ff-f6d1-4da0-b41a-6cc785603e72" />

…ay Serve LLM (ray-project#57830) Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

we are not releasing `x86_64` wheels anymore Signed-off-by: Lonnie Liu <[email protected]>

…57817) Signed-off-by: dayshah <[email protected]>

…igurable (ray-project#57705) Recently, when we ran performance tests with task event generation turned on. We saw some performance regression when the workloads ran on very small CPU machines. With further investigation, the overhead mainly comes from the name format convention when converting the proto message to JSON format payload in the aggregator agent. This PR adds an env var for the config to control the name conversion behavior and update the corresponding tests. Also note that, eventually we are planning to remove this config turn off the field name conversion by default after migrated all the current event usage. --------- Signed-off-by: Mengjin Yan <[email protected]>

## Description Adds PredicatePushdown Rule to Ray Data. Pushes predicates through Read (for supporting datasources), Union ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Goutam <[email protected]>

Signed-off-by: abrar <[email protected]>

…ct#58281) Supports token based authentication in dashboard head sdk, all clients which build on top of the submission_client will now support token auth out of the box. so this covers all cli commands like job submit, state api, serve related cli commands etc. --------- Signed-off-by: sampan <[email protected]> Signed-off-by: Sampan S Nayak <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Co-authored-by: sampan <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

this helps prevent an edge case when using file based log exporters like vector that use fingerprinting [ref](https://vector.dev/docs/reference/configuration/sources/file/#fingerprint) to identify unique files. example edge case that this fixes: two jobs are submitted to a cluster and begin executing at the same time, they both contain an invalid entrypoint that references a nonexistant file before fix: - both jobs have the identical "Runtime env is setting up" log with identical timestamps - both jobs have identical entrypoint failure logs as a result, the log files for these jobs are identical, so vector will only export one. after fix: - both jobs have the identical "Runtime env is setting up" log with identical timestamps - each job has a **unique** entrypoint log containing its job_id - both jobs have identical entrypoint failure logs vector can differentiate between these two files, so both will be exported --------- Signed-off-by: Chris Fellowes <[email protected]> Signed-off-by: chrisfellowes <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

…8341) When running on older machines with less impressive CPU, and when under a bit of load. We've seen the dashboard failing to start in the expected 200 iterations (`sleep(0.1)` for 20 seconds). Increases this timeout to 60 seconds. --------- Signed-off-by: morotti <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

…s-process support (ray-project#58332) Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

…oject#58344) so that we can perform surgeries in the last minute if required. Signed-off-by: Lonnie Liu <[email protected]>

add `--use-pep517` flag; otherwise some part of the wheel building logic does not work on windows python 3.12 also removes the unnecessary "uninstall+reinstall" dance. the script only builds the wheel, it does not (and should not) install the wheel. Signed-off-by: Lonnie Liu <[email protected]>

observability release tests on py3.10 Successful release test run: https://buildkite.com/ray-project/release/builds/65851 failing tests are set to manual (disabled): aws_cluster_launcher_release_image k8s_serve_ha_test enabling agent_stress_test.gce which is now passing --------- Signed-off-by: elliot-barn <[email protected]>

## Description Replace `map_batches` and numpy invocations with `with_column` and arrow kernels Release test: https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Goutam <[email protected]>

…collate_fn (ray-project#58327) Signed-off-by: Gang Zhao <[email protected]> Co-authored-by: Gang Zhao <[email protected]>

## Description This fixes the symmetric-run cli workflow. Right now if you use `ray symmetric-run` on 2.51 like ``` ray symmetric-run --address 127.0.0.1:6379 -- python my_script.py ``` it will throw since the `symmetric-run` arg is not caught. This was only caught once it became part of the CLI. ## Related issues ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Richard Liaw <[email protected]>

…t#58247) Updating hello world release & cluster release tests to run on py3.10 Passing release tests: https://buildkite.com/ray-project/release/builds/65844 --------- Signed-off-by: elliot-barn <[email protected]>

Fix typos Signed-off-by: Jiajun Yao <[email protected]>

The current examples describe that label bundles are written as: `[{"ray.io/accelerator-type": "H100"}* 2]`, i.e. a dict * integer. This is wrong it has to be the list that is multiplied. This PR fixes this. Signed-off-by: Daraan <[email protected]>

## Description In this function, `Result::from_path` is implemented in ray train v2, which reconstructs a `Result` object from the checkpoints. This implementation leverages `CheckpointManager` and refers to https://github.com/ray-project/ray/blob/master/python/ray/train/v2/_internal/execution/controller/controller.py#L512-L540 --------- Signed-off-by: xgui <[email protected]> Signed-off-by: Justin Yu <[email protected]> Co-authored-by: Justin Yu <[email protected]>

Add "WORKDIR /home/ray" in build-docker.sh. If "WORKDIR" is not set, it defaults to /root, causing permission issues with conda init. ``` 31.00 # >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<< 31.00 31.00 Traceback (most recent call last): 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/exception_handler.py", line 18, in __call__ 31.00 return func(*args, **kwargs) 31.00 ^^^^^^^^^^^^^^^^^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/cli/main.py", line 44, in main_subshell 31.00 context.__init__(argparse_args=pre_args) 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/base/context.py", line 517, in __init__ 31.00 self._set_search_path( 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/common/configuration.py", line 1430, in _set_search_path 31.00 self._search_path = IndexedSet(self._expand_search_path(search_path, **kwargs)) 31.00 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/boltons/setutils.py", line 118, in __init__ 31.00 self.update(other) 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/boltons/setutils.py", line 351, in update 31.00 for o in other: 31.00 ^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/common/configuration.py", line 1403, in _expand_search_path 31.00 if path.is_file() and ( 31.00 ^^^^^^^^^^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/pathlib.py", line 892, in is_file 31.00 return S_ISREG(self.stat().st_mode) 31.00 ^^^^^^^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/pathlib.py", line 840, in stat 31.00 return os.stat(self, follow_symlinks=follow_symlinks) 31.00 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 31.00 PermissionError: [Errno 13] Permission denied: '$XDG_CONFIG_HOME/conda/.condarc' 31.00 31.00 `$ /home/ray/anaconda3/bin/conda init` 31.00 31.00 environment variables: 31.00 CIO_TEST=<not set> 31.00 CONDA_ROOT=/home/ray/anaconda3 31.00 CURL_CA_BUNDLE=<not set> 31.00 HTTPS_PROXY=<set> 31.00 HTTP_PROXY=<set> 31.00 LD_LIBRARY_PATH=:/usr/local/nvidia/lib64 31.00 LD_PRELOAD=<not set> 31.00 NO_PROXY=<set> 31.00 PATH=/home/ray/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/ 31.00 bin:/sbin:/bin:/usr/local/nvidia/bin 31.00 PYTHON_VERSION=3.9 31.00 REQUESTS_CA_BUNDLE=<not set> 31.00 SSL_CERT_FILE=<not set> 31.00 http_proxy=<set> 31.00 https_proxy=<set> 31.00 no_proxy=<set> ``` Signed-off-by: my-vegetable-has-exploded <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

…ct#58320) Signed-off-by: win5923 <[email protected]>

…oject#58329) Created by release automation bot. Update with commit a69004e Signed-off-by: kevin <[email protected]>

… and GRPO. (ray-project#57961) ## Description Example for first blog in the RDT series using NIXL for GPU-GPU tensor transfers. --------- Signed-off-by: Ricardo Decal <[email protected]> Signed-off-by: Stephanie Wang <[email protected]> Co-authored-by: Ricardo Decal <[email protected]> Co-authored-by: Stephanie Wang <[email protected]> Co-authored-by: Qiaolin Yu <[email protected]>

sourcery-ai

The pull request #668 has too many files changed.

The GitHub API will only let us fetch up to 300 changed files, and this pull request has 5194.

gemini-code-assist · 2025-11-03T03:53:15Z

Summary of Changes

Hello @antfin-oss, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents an automated daily merge from the master branch into main. It integrates a substantial collection of updates across the Ray project, primarily enhancing its CI/CD infrastructure, build system configurations, and documentation. The changes reflect a continuous effort to improve build reliability, streamline dependency management, and refine the development workflow.

Highlights

CI/CD Infrastructure Overhaul: Extensive refactoring of Buildkite pipelines, including the introduction of new YAML files for image definitions and dedicated build steps for core Ray components, dashboard, and Java.
Dependency Management Evolution: Transition from Miniconda to Miniforge for Python environments and the adoption of a new raydepsets tool for more robust and centralized Python dependency management.
Bazel Build System Refinements: Significant updates to Bazel rules for C++ and Python packaging, third-party dependency handling, and enforcing stricter build environments for improved determinism.
Docker Image Strategy Enhancements: Implementation of new Docker image tagging conventions, including RAYCI_BUILD_ID for better traceability, and a new mechanism for generating multi-architecture image indexes.
Documentation and Linting Improvements: Integration of new linting tools (semgrep, vale, cython-lint, eslint) into pre-commit hooks, along with a refined pydoclint setup and an updated documentation build process leveraging pre-built Ray components.
C++ API and Runtime Modernization: Updates to the C++ API for metrics, runtime environment, and network utilities, alongside internal refactorings for process management and UUID generation.

Ignored Files

Ignored by pattern: .gemini/** (1)
- .gemini/config.yaml
Ignored by pattern: .github/workflows/** (1)
- .github/workflows/stale_pull_request.yaml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is an automated daily merge that includes a massive set of changes, primarily focused on refactoring the CI/CD pipelines and Bazel build system. The changes are extensive and touch many configuration files, build scripts, and some parts of the core C++ and Python code.

Key changes include:

CI/CD Refactoring: The Buildkite pipelines have been significantly modularized. Image building steps are separated, and dependency management is centralized with a new raydepsets system. This should improve maintainability and clarity of the CI configuration.
Bazel Build System Overhaul: The root BUILD.bazel file has been cleaned up, with targets moved to more appropriate subdirectories. The packaging logic has been modernized using rules_pkg instead of custom genrules. The workspace name has been updated to io_ray.
Code Quality and Linting: Pre-commit hooks have been updated and expanded, with tools like semgrep, vale, and eslint being added. The old format.sh script has been removed in favor of pre-commit.
Dependency Management: Switched from Miniconda to Miniforge. Several dependencies have been updated or patched.
Platform Support: macOS builds for x86_64 seem to have been removed, standardizing on arm64 for macOS. Java tests on macOS also appear to be disabled.
Code Modernization: Some C++ code has been updated to use modern C++ features (e.g., std::invoke_result_t).

The overall direction of these changes is very positive, leading to a more maintainable, reproducible, and robust build and test system.

I've found a couple of issues in the test suite where assertions seem to be incorrect after the refactoring of how Docker image tags are generated. These tests will likely fail and need to be updated. My review comments point these out with suggestions for fixes.

gemini-code-assist · 2025-11-03T03:56:51Z

ci/ray_ci/test_linux_tester_container.py

    with mock.patch("subprocess.check_call", side_effect=_mock_subprocess):
        LinuxTesterContainer("team", build_type="debug")
-        docker_image = f"{_DOCKER_ECR_REPO}:{_RAYCI_BUILD_ID}-team"
+        docker_image = f"{_DOCKER_ECR_REPO}:team"


The docker_image variable is constructed without the RAYCI_BUILD_ID prefix, but the _get_docker_image method, which is called by LinuxTesterContainer's install_ray method, will prepend it. This will cause the assertion on the next line to fail. The RAYCI_BUILD_ID is set to "a1b2c3d4" in the test setup.

Suggested change

docker_image = f"{_DOCKER_ECR_REPO}:team"

docker_image = f"{_DOCKER_ECR_REPO}:a1b2c3d4-team"

gemini-code-assist · 2025-11-03T03:56:52Z

ci/ray_ci/test_windows_container.py

-        image = (
-            "029272617770.dkr.ecr.us-west-2.amazonaws.com/rayproject/citemp:unknown-hi"
-        )
+        image = "029272617770.dkr.ecr.us-west-2.amazonaws.com/rayproject/citemp:hi"


The image variable is constructed without the RAYCI_BUILD_ID prefix, but the _get_docker_image method, which is called by WindowsContainer's install_ray method, will prepend it. This will cause the assertion on the next line to fail. The RAYCI_BUILD_ID is set to "a1b2c3d4" in the test setup.

image = ( "029272617770.dkr.ecr.us-west-2.amazonaws.com/rayproject/citemp:a1b2c3d4-hi" )

github-actions · 2025-11-18T01:43:17Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

ryanaoleary and others added 30 commits October 17, 2025 15:59

[Data] Fix test_expression_evaluator (ray-project#57848)

194ddf8

The old expression evaluator did not correctly handle `is_in` which failed tests in `test_expression_evaluator` ## Related issues Fixes ray-project#57820 --------- Signed-off-by: Goutam <[email protected]>

[train] Turn on Train v2 by default (ray-project#57857)

338ae52

Release tests, unit tests, doctests, and examples are all migrated to V2, so this PR turns V2 on by default. To run with Train V1 (deprecated), set `RAY_TRAIN_V2_ENABLED=0`. --------- Signed-off-by: Justin Yu <[email protected]>

document performance flags for serve (ray-project#57845)

4d5485c

Signed-off-by: abrar <[email protected]>

[docs][serve][llm] Reorganize Ray Serve LLM documentation with user g…

7806bf2

…uides and improved navigation (ray-project#57787) Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: kourosh hakhamaneshi <[email protected]> Co-authored-by: angelinalg <[email protected]>

[data][llm] Add per-stage map kwargs for build_llm_processor preproce…

22c755d

…ss/postprocess (ray-project#57826) Signed-off-by: Nikhil Ghosh <[email protected]>

[release] Hello world test for Azure (ray-project#57597)

85a7acb

First test running on AKS cloud! --------- Signed-off-by: kevin <[email protected]> Signed-off-by: Kevin H. Luu <[email protected]> Co-authored-by: Lonnie Liu <[email protected]>

[release] Hello world release test on GCE (ray-project#57695)

943b9ae

- Add 2 hello world tests with regular base image & custom image running on GCE --------- Signed-off-by: kevin <[email protected]> Signed-off-by: Kevin H. Luu <[email protected]>

[train] bump test_torch_trainer timeout (ray-project#57873)

2fc7193

## Description Bump from small to medium due to timeouts happening specifically in py3.12 tests. --------- Signed-off-by: Matthew Deng <[email protected]>

[Core] Reschedule leases in local lease manager when draining the node (

993139e

ray-project#57834) Signed-off-by: Jiajun Yao <[email protected]>

[core] use invoke_result_t in cpp worker example (ray-project#57885)

697c7bc

`result_of_t` is deprecated Signed-off-by: Lonnie Liu <[email protected]>

[serve][llm][refactor] Align Ray Serve LLM Code Structure with Archit…

de50b23

…ectural Design (ray-project#57889) Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

[Doc][Serve] Import AutoscalingContext in autoscaling policy example (r…

b988ce4

…ay-project#57876) ## Description ## Related issues Closes ray-project#57847 ## Additional information Signed-off-by: daiping8 <[email protected]>

removed adding the TaskPoolStrategy as it's not needed here (ray-proj…

b4f7a70

…ect#57897)

[docs][serve][llm] Add comprehensive architecture documentation for R…

3287523

…ay Serve LLM (ray-project#57830) Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

[release auto] remove x86_64 wheel verification (ray-project#57913)

6d51184

we are not releasing `x86_64` wheels anymore Signed-off-by: Lonnie Liu <[email protected]>

[core] Kill raylet file and just keep node manager file (ray-project#…

532ac12

…57817) Signed-off-by: dayshah <[email protected]>

goutamvenkat-anyscale and others added 20 commits October 31, 2025 10:45

add telemetry for custom autoscaling usage (ray-project#58336)

da9ad9c

Signed-off-by: abrar <[email protected]>

[serve][llm] Add generic component registry for LLM serving with cros…

2f55e56

…s-process support (ray-project#58332) Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

[release auto] support uploading wheels from arbitrary branch (ray-pr…

6d1e343

…oject#58344) so that we can perform surgeries in the last minute if required. Signed-off-by: Lonnie Liu <[email protected]>

[Template] Update image-search-and-classification to pass device for …

b6e6210

…collate_fn (ray-project#58327) Signed-off-by: Gang Zhao <[email protected]> Co-authored-by: Gang Zhao <[email protected]>

Fix typos (ray-project#58349)

4b64508

Fix typos Signed-off-by: Jiajun Yao <[email protected]>

[Docs][KubeRay] Add Volcano RayJob gang scheduling example (ray-proje…

91ac4c7

…ct#58320) Signed-off-by: win5923 <[email protected]>

[docker] Update latest Docker dependencies for 2.51.0 release (ray-pr…

c90aacc

…oject#58329) Created by release automation bot. Update with commit a69004e Signed-off-by: kevin <[email protected]>

antfin-oss requested review from SongGuyang and kfstorm as code owners November 3, 2025 03:01

antfin-oss added auto-generated daily-merge labels Nov 3, 2025

antfin-oss assigned ffbin Nov 3, 2025

sourcery-ai bot reviewed Nov 3, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 3, 2025

View reviewed changes

github-actions bot added the stale label Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🔄 daily merge: master → main 2025-11-03 #668

🔄 daily merge: master → main 2025-11-03 #668

Uh oh!

antfin-oss commented Nov 3, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

gemini-code-assist bot commented Nov 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 3, 2025

Uh oh!

gemini-code-assist bot Nov 3, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

81 participants

	docker_image = f"{_DOCKER_ECR_REPO}:team"
	docker_image = f"{_DOCKER_ECR_REPO}:a1b2c3d4-team"

🔄 daily merge: master → main 2025-11-03 #668

Are you sure you want to change the base?

🔄 daily merge: master → main 2025-11-03 #668

Uh oh!

Conversation

antfin-oss commented Nov 3, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot commented Nov 3, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

81 participants