Skip to content

Conversation

@antfin-oss
Copy link

This Pull Request was created automatically to merge the latest changes from master into main branch.

πŸ“… Created: 2025-11-03
πŸ”€ Merge direction: master β†’ main
πŸ€– Triggered by: Scheduled

Please review and merge if everything looks good.

ryanaoleary and others added 30 commits October 17, 2025 15:59
…Autoscaling config (ray-project#57260)

In KubeRay v1.5, new structured `Resources` and `Labels` fields will be
added to both the `HeadGroupSpec` and `WorkerGroupSpec` to enable users
to explicitly define these values without relying on `rayStartParams`.
This support is implemented in
ray-project/kuberay#4106. In order for this
change to work with the Ray autoscaler, this PR adds support for
checking the `resources` and `labels` field (with precedence over the
values `rayStartParams` and k8s constainer spec) and generating the
resulting autoscaling config. This change is compatible with older
versions of KubeRay that do not have these new fields, since we fall
back to the previous logic when the top-level fields aren't specified.

Example:

For a group spec specified like this:
```
workerGroupSpecs:
- groupName: worker-group-1
  replicas: 1
  resources:
    GPU: 8
    CPU: 16
  labels:
    ray.io/zone: us-west2-a
    ray.io/region: us-west2
```

The resulting autoscaling config for worker-group-1 would be:
```
"worker-group-1": {
  "labels": {" ray.io/zone": "us-west2-a", " ray.io/region": "us-west2"},
  ...
  "resources": {
    "CPU": 16,
    GPU: 8,
  },
},
``` 

## Related issue number

Contributes to ray-project#51564

---------

Signed-off-by: Ryan O'Leary <[email protected]>
…ing cluster teardown (ray-project#57610)

After running `ray down` several resources (a managed service identiy,
network security group, etc) at left in the subscription and re-running
`ray up` (without specifying --no-config-cache) will sometimes run into
errors because of this.

## Related issue number

Fixes: ray-project#55392

Signed-off-by: Mark Rossett <[email protected]>
The old expression evaluator did not correctly handle `is_in` which
failed tests in `test_expression_evaluator`

## Related issues

Fixes ray-project#57820

---------

Signed-off-by: Goutam <[email protected]>
Release tests, unit tests, doctests, and examples are all migrated to
V2, so this PR turns V2 on by default. To run with Train V1
(deprecated), set `RAY_TRAIN_V2_ENABLED=0`.

---------

Signed-off-by: Justin Yu <[email protected]>
…ct#57133)

This PR adds a workspace template that walks users through how to
integrate PyTorch Profiler with Ray Train. The purpose of this template
is to walk user through how to generate trace/memory profile with
Pytorch Profiler in the TorchTrainer.

For a high level overview, this template covers:

1. A hands-on example of training an image classification model
2. A simple torch profiler integration script
3. Some more advanced use cases including `record_function` to customize
the profiling experience.
4. a successful release test run:
https://buildkite.com/ray-project/release/builds/63492#0199e3e4-fa8c-4f4f-a92c-f6d47a415c53

Testing:
testing in Anyscale workspace

---------

Signed-off-by: Lehui Liu <[email protected]>
…uides and improved navigation (ray-project#57787)

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: kourosh hakhamaneshi <[email protected]>
Co-authored-by: angelinalg <[email protected]>
## Description
This removes an orphaned code file that was previously used by the
Preprocessor User Guide.

## Related issues
Corresponding User Guide was removed in ray-project#44006.
Closes ray-project#57867.

## Additional details
This test started failing because of the new `XGBoostTrainer` API
enabled by default with Ray Train V2. Rather than update the snippet,
removing this code instead.

Signed-off-by: Matthew Deng <[email protected]>
adding eslint and prettier script to precommit before getting rid of
format.sh

1 step closer to replacing scripts/format.sh with pre-commit (pre-commit
is currently missing eslint)

tested locally: 
<img width="898" height="929" alt="image"
src="https://github.com/user-attachments/assets/58c77fb7-bdde-47ae-ac2b-b864334b3f30"
/>

---------

Signed-off-by: elliot-barn <[email protected]>
First test running on AKS cloud!

---------

Signed-off-by: kevin <[email protected]>
Signed-off-by: Kevin H. Luu <[email protected]>
Co-authored-by: Lonnie Liu <[email protected]>
- Add 2 hello world tests with regular base image & custom image running
on GCE

---------

Signed-off-by: kevin <[email protected]>
Signed-off-by: Kevin H. Luu <[email protected]>
## Description

Bump from small to medium due to timeouts happening specifically in
py3.12 tests.

---------

Signed-off-by: Matthew Deng <[email protected]>
## Why are these changes needed?

Computing the `num_module_steps_trained_(lifetime)_throughput` metrics
are biased due to the way how we record throughput times in a loop over
module batches. This PR offers a fix to this bias.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: simonsays1980 <[email protected]>
Co-authored-by: Kamil Kaczmarek <[email protected]>
…orker` (ray-project#57859)

## Description

The type annotation for `actor_location_tracker` is currently
`ActorLocationTracker`, but it should be
`ray.actor.ActorHandle[ActorLocationTracker]`. This PR fixes that issue.

Signed-off-by: Balaji Veeramani <[email protected]>
…r'. (ray-project#57673)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

The type hints for `learner_connector` in `AlgorithmConfig.training` was
deprecated still using the `RLModule` as parameter. This PR adjust type
hints to the actual expected form of the callable.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: simonsays1980 <[email protected]>
`result_of_t` is deprecated

Signed-off-by: Lonnie Liu <[email protected]>
- disables java tests; ray java not supported on apple silicon yet.    
- skipping cpp tests that are not passing yet

we already stopped releasing macos wheels for Intel silicon, the tests
that are disabled or skipped were never passing on apple silicon, so
nothing is regressed.

Signed-off-by: Lonnie Liu <[email protected]>
…ay-project#57876)

## Description

## Related issues
Closes ray-project#57847

## Additional information

Signed-off-by: daiping8 <[email protected]>
…ystem cgroup (ray-project#57864)

For more details about the resource isolation project see
ray-project#54703.

When starting the head node, move the dashboard api server's
subprocesses into the system cgroup. I updated the integration test and
added a helpful error message because the test will break in the future
when a new dashboard module is added.

I ran the integration tests 25 times locally. 

> (ray2) ubuntu@devbox:~/code/ray2$ python -m pytest -s
python/ray/tests/resource_isolation/test_resource_isolation_integration.py
--count 25 -x
...
collecting ... 

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 25% β–ˆβ–ˆβ–Œ 2025-10-17 23:13:51,897 INFO
worker.py:1833 -- Connecting to existing Ray cluster at address:
172.31.12.251:6379...
2025-10-17 23:13:51,905 INFO worker.py:2004 -- Connected to Ray cluster.
View the dashboard at http://127.0.0.1:8265

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 26% β–ˆβ–ˆβ–‹ 2025-10-17 23:13:57,592 INFO
worker.py:1833 -- Connecting to existing Ray cluster at address:
172.31.12.251:6379...
2025-10-17 23:13:57,598 INFO worker.py:2004 -- Connected to Ray cluster.
View the dashboard at http://127.0.0.1:8265

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 98% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š2025-10-17 23:19:45,417 INFO
worker.py:2004 -- Started a local Ray instance. View the dashboard at
http://127.0.0.1:8265

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“
99% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰2025-10-17 23:19:50,194 INFO worker.py:2004 -- Started a
local Ray instance. View the dashboard at http://127.0.0.1:8265

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“
100% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
Results (366.41s):
     100 passed

---------

Signed-off-by: irabbani <[email protected]>
…roject#57037)

During the execution of tail_job_logs() after the job submission, if the
ray head connection breaks, the tail_job_logs() will not raise any
error. The error should be raised.

Query the rayjob status when receiving the message, and raise error if
connection closed with rayjob not in terminate stage.

## Related issue number

Closes: ray-project#57002

---------

Signed-off-by: machichima <[email protected]>
…ay-project#57802)

## Description

1. This PR added the `jax.distributed.shutdown()` for JaxBackend in
order to free up any leaked resources on TPU RayTrainWorkers.
2. if `jax.distributed` is not on, it is a noop:
https://docs.jax.dev/en/latest/_autosummary/jax.distributed.shutdown.html
3. Tested on Anyscale workspace.
<img width="1264" height="62" alt="image"
src="https://github.com/user-attachments/assets/f28102ff-f6d1-4da0-b41a-6cc785603e72"
/>
we are not releasing `x86_64` wheels anymore

Signed-off-by: Lonnie Liu <[email protected]>
…igurable (ray-project#57705)

Recently, when we ran performance tests with task event generation
turned on. We saw some performance regression when the workloads ran on
very small CPU machines. With further investigation, the overhead mainly
comes from the name format convention when converting the proto message
to JSON format payload in the aggregator agent.

This PR adds an env var for the config to control the name conversion
behavior and update the corresponding tests.

Also note that, eventually we are planning to remove this config turn
off the field name conversion by default after migrated all the current
event usage.

---------

Signed-off-by: Mengjin Yan <[email protected]>
goutamvenkat-anyscale and others added 20 commits October 31, 2025 10:45
## Description
Adds PredicatePushdown Rule to Ray Data. 

Pushes predicates through Read (for supporting datasources), Union

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <[email protected]>
…ct#58281)

Supports token based authentication in dashboard head sdk, all clients
which build on top of the submission_client will now support token auth
out of the box. so this covers all cli commands like job submit, state
api, serve related cli commands etc.

---------

Signed-off-by: sampan <[email protected]>
Signed-off-by: Sampan S Nayak <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Co-authored-by: sampan <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
this helps prevent an edge case when using file based log exporters like
vector that use fingerprinting
[ref](https://vector.dev/docs/reference/configuration/sources/file/#fingerprint)
to identify unique files.

example edge case that this fixes:
two jobs are submitted to a cluster and begin executing at the same
time, they both contain an invalid entrypoint that references a
nonexistant file

before fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
  - both jobs have identical entrypoint failure logs
  
as a result, the log files for these jobs are identical, so vector will
only export one.

after fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
- each job has a **unique** entrypoint log containing its job_id
- both jobs have identical entrypoint failure logs

vector can differentiate between these two files, so both will be
exported

---------

Signed-off-by: Chris Fellowes <[email protected]>
Signed-off-by: chrisfellowes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
…8341)

When running on older machines with less impressive CPU, and when under
a bit of load. We've seen the dashboard failing to start in the expected
200 iterations (`sleep(0.1)` for 20 seconds). Increases this timeout to 60 seconds.

---------

Signed-off-by: morotti <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
…oject#58344)

so that we can perform surgeries in the last minute if required.

Signed-off-by: Lonnie Liu <[email protected]>
add `--use-pep517` flag; otherwise some part of the wheel building logic
does not work on windows python 3.12

also removes the unnecessary "uninstall+reinstall" dance. the script
only builds the wheel, it does not (and should not) install the wheel.

Signed-off-by: Lonnie Liu <[email protected]>
observability release tests on py3.10

Successful release test run:
https://buildkite.com/ray-project/release/builds/65851
failing tests are set to manual (disabled):
aws_cluster_launcher_release_image
k8s_serve_ha_test

enabling agent_stress_test.gce which is now passing

---------

Signed-off-by: elliot-barn <[email protected]>
## Description
Replace `map_batches` and numpy invocations with `with_column` and arrow
kernels

Release test:
https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <[email protected]>
## Description

This fixes the symmetric-run cli workflow.

Right now if you use `ray symmetric-run` on 2.51 like 
```
 ray symmetric-run --address 127.0.0.1:6379 -- python my_script.py   
```

it will throw since the `symmetric-run` arg is not caught. This was only
caught once it became part of the CLI.

## Related issues

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Richard Liaw <[email protected]>
…t#58247)

Updating hello world release & cluster release tests to run on py3.10

Passing release tests:
https://buildkite.com/ray-project/release/builds/65844

---------

Signed-off-by: elliot-barn <[email protected]>
Fix typos

Signed-off-by: Jiajun Yao <[email protected]>
The current examples describe that label bundles are written as:

`[{"ray.io/accelerator-type": "H100"}* 2]`, i.e. a dict * integer.
This is wrong it has to be the list that is multiplied.
This PR fixes this.

Signed-off-by: Daraan <[email protected]>
## Description

In this function, `Result::from_path` is implemented in ray train v2,
which reconstructs a `Result` object from the checkpoints. This
implementation leverages `CheckpointManager` and refers to
https://github.com/ray-project/ray/blob/master/python/ray/train/v2/_internal/execution/controller/controller.py#L512-L540

---------

Signed-off-by: xgui <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Co-authored-by: Justin Yu <[email protected]>
Add "WORKDIR /home/ray" in build-docker.sh.

If "WORKDIR" is not set, it defaults to /root, causing permission issues
with conda init.

```
31.00 # >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
31.00 
31.00     Traceback (most recent call last):
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/exception_handler.py", line 18, in __call__
31.00         return func(*args, **kwargs)
31.00                ^^^^^^^^^^^^^^^^^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/cli/main.py", line 44, in main_subshell
31.00         context.__init__(argparse_args=pre_args)
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/base/context.py", line 517, in __init__
31.00         self._set_search_path(
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/common/configuration.py", line 1430, in _set_search_path
31.00         self._search_path = IndexedSet(self._expand_search_path(search_path, **kwargs))
31.00                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/boltons/setutils.py", line 118, in __init__
31.00         self.update(other)
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/boltons/setutils.py", line 351, in update
31.00         for o in other:
31.00                  ^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/common/configuration.py", line 1403, in _expand_search_path
31.00         if path.is_file() and (
31.00            ^^^^^^^^^^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/pathlib.py", line 892, in is_file
31.00         return S_ISREG(self.stat().st_mode)
31.00                        ^^^^^^^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/pathlib.py", line 840, in stat
31.00         return os.stat(self, follow_symlinks=follow_symlinks)
31.00                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
31.00     PermissionError: [Errno 13] Permission denied: '$XDG_CONFIG_HOME/conda/.condarc'
31.00 
31.00 `$ /home/ray/anaconda3/bin/conda init`
31.00 
31.00   environment variables:
31.00                  CIO_TEST=<not set>
31.00                CONDA_ROOT=/home/ray/anaconda3
31.00            CURL_CA_BUNDLE=<not set>
31.00               HTTPS_PROXY=<set>
31.00                HTTP_PROXY=<set>
31.00           LD_LIBRARY_PATH=:/usr/local/nvidia/lib64
31.00                LD_PRELOAD=<not set>
31.00                  NO_PROXY=<set>
31.00                      PATH=/home/ray/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
31.00                           bin:/sbin:/bin:/usr/local/nvidia/bin
31.00            PYTHON_VERSION=3.9
31.00        REQUESTS_CA_BUNDLE=<not set>
31.00             SSL_CERT_FILE=<not set>
31.00                http_proxy=<set>
31.00               https_proxy=<set>
31.00                  no_proxy=<set>
```

Signed-off-by: my-vegetable-has-exploded <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
…oject#58329)

Created by release automation bot.

Update with commit a69004e

Signed-off-by: kevin <[email protected]>
… and GRPO. (ray-project#57961)

## Description
Example for first blog in the RDT series using NIXL for GPU-GPU tensor
transfers.

---------

Signed-off-by: Ricardo Decal <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Co-authored-by: Ricardo Decal <[email protected]>
Co-authored-by: Stephanie Wang <[email protected]>
Co-authored-by: Qiaolin Yu <[email protected]>
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request #668 has too many files changed.

The GitHub API will only let us fetch up to 300 changed files, and this pull request has 5194.

@gemini-code-assist
Copy link

Summary of Changes

Hello @antfin-oss, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents an automated daily merge from the master branch into main. It integrates a substantial collection of updates across the Ray project, primarily enhancing its CI/CD infrastructure, build system configurations, and documentation. The changes reflect a continuous effort to improve build reliability, streamline dependency management, and refine the development workflow.

Highlights

  • CI/CD Infrastructure Overhaul: Extensive refactoring of Buildkite pipelines, including the introduction of new YAML files for image definitions and dedicated build steps for core Ray components, dashboard, and Java.
  • Dependency Management Evolution: Transition from Miniconda to Miniforge for Python environments and the adoption of a new raydepsets tool for more robust and centralized Python dependency management.
  • Bazel Build System Refinements: Significant updates to Bazel rules for C++ and Python packaging, third-party dependency handling, and enforcing stricter build environments for improved determinism.
  • Docker Image Strategy Enhancements: Implementation of new Docker image tagging conventions, including RAYCI_BUILD_ID for better traceability, and a new mechanism for generating multi-architecture image indexes.
  • Documentation and Linting Improvements: Integration of new linting tools (semgrep, vale, cython-lint, eslint) into pre-commit hooks, along with a refined pydoclint setup and an updated documentation build process leveraging pre-built Ray components.
  • C++ API and Runtime Modernization: Updates to the C++ API for metrics, runtime environment, and network utilities, alongside internal refactorings for process management and UUID generation.
Ignored Files
  • Ignored by pattern: .gemini/** (1)
    • .gemini/config.yaml
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/stale_pull_request.yaml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with πŸ‘ and πŸ‘Ž on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an automated daily merge that includes a massive set of changes, primarily focused on refactoring the CI/CD pipelines and Bazel build system. The changes are extensive and touch many configuration files, build scripts, and some parts of the core C++ and Python code.

Key changes include:

  • CI/CD Refactoring: The Buildkite pipelines have been significantly modularized. Image building steps are separated, and dependency management is centralized with a new raydepsets system. This should improve maintainability and clarity of the CI configuration.
  • Bazel Build System Overhaul: The root BUILD.bazel file has been cleaned up, with targets moved to more appropriate subdirectories. The packaging logic has been modernized using rules_pkg instead of custom genrules. The workspace name has been updated to io_ray.
  • Code Quality and Linting: Pre-commit hooks have been updated and expanded, with tools like semgrep, vale, and eslint being added. The old format.sh script has been removed in favor of pre-commit.
  • Dependency Management: Switched from Miniconda to Miniforge. Several dependencies have been updated or patched.
  • Platform Support: macOS builds for x86_64 seem to have been removed, standardizing on arm64 for macOS. Java tests on macOS also appear to be disabled.
  • Code Modernization: Some C++ code has been updated to use modern C++ features (e.g., std::invoke_result_t).

The overall direction of these changes is very positive, leading to a more maintainable, reproducible, and robust build and test system.

I've found a couple of issues in the test suite where assertions seem to be incorrect after the refactoring of how Docker image tags are generated. These tests will likely fail and need to be updated. My review comments point these out with suggestions for fixes.

with mock.patch("subprocess.check_call", side_effect=_mock_subprocess):
LinuxTesterContainer("team", build_type="debug")
docker_image = f"{_DOCKER_ECR_REPO}:{_RAYCI_BUILD_ID}-team"
docker_image = f"{_DOCKER_ECR_REPO}:team"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The docker_image variable is constructed without the RAYCI_BUILD_ID prefix, but the _get_docker_image method, which is called by LinuxTesterContainer's install_ray method, will prepend it. This will cause the assertion on the next line to fail. The RAYCI_BUILD_ID is set to "a1b2c3d4" in the test setup.

Suggested change
docker_image = f"{_DOCKER_ECR_REPO}:team"
docker_image = f"{_DOCKER_ECR_REPO}:a1b2c3d4-team"

image = (
"029272617770.dkr.ecr.us-west-2.amazonaws.com/rayproject/citemp:unknown-hi"
)
image = "029272617770.dkr.ecr.us-west-2.amazonaws.com/rayproject/citemp:hi"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The image variable is constructed without the RAYCI_BUILD_ID prefix, but the _get_docker_image method, which is called by WindowsContainer's install_ray method, will prepend it. This will cause the assertion on the next line to fail. The RAYCI_BUILD_ID is set to "a1b2c3d4" in the test setup.

        image = (
            "029272617770.dkr.ecr.us-west-2.amazonaws.com/rayproject/citemp:a1b2c3d4-hi"
        )

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale label Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.