[Bug] `Status.MaxWorkerReplicas` overflow when numOfHosts > 1

### Search before asking

- [x] I searched the [issues](https://github.com/ray-project/kuberay/issues) and found no similar issues.


### KubeRay Component

ray-operator

### What happened + What you expected to happen

Applying the following YAML will cause an overflow in `Status.MaxWorkerReplicas`:
```
workerGroupSpecs:
  - replicas: 1
    minReplicas: 3
    numOfHosts: 4
```

By default, `maxReplicas` is set to 2147483647. 
https://github.com/ray-project/kuberay/blob/3471f99e23084d722734369f102b9965674bde47/ray-operator/apis/ray/v1/raycluster_types.go#L112-L114
However, in `CalculateMaxReplicas`, there’s no check to prevent overflow when multiplying by `numOfHosts`, which can result in an incorrect value for `Status.MaxWorkerReplicas`.

https://github.com/ray-project/kuberay/blob/3471f99e23084d722734369f102b9965674bde47/ray-operator/controllers/ray/utils/util.go#L394-L405


### Reproduction script

Applying the following YAML
```
# For examples with more realistic resource configuration, see
# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: raycluster-kuberay
spec:
  rayVersion: '2.46.0' # should match the Ray version in the image of the containers
  # Ray head pod template
  headGroupSpec:
    # rayStartParams is optional with RayCluster CRD from KubeRay 1.4.0 or later but required in earlier versions.
    rayStartParams: {}
    template:
      spec:
        schedulerName: default-scheduler
        containers:
        - name: ray-head
          image: rayproject/ray:2.46.0
          resources:
            limits:
              cpu: 1
              memory: 2G
            requests:
              cpu: 1
              memory: 2G
          ports:
          - containerPort: 6379
            name: gcs-server
          - containerPort: 8265 # Ray dashboard
            name: dashboard
          - containerPort: 10001
            name: client
  workerGroupSpecs:
  # the pod replicas in this group typed worker
  - replicas: 1
    minReplicas: 3
    numOfHosts: 4
    # logical group name, for this called small-group, also can be functional
    groupName: workergroup
    # rayStartParams is optional with RayCluster CRD from KubeRay 1.4.0 or later but required in earlier versions.
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
          image: rayproject/ray:2.46.0
          resources:
            limits:
              cpu: 1
              memory: 1G
            requests:
              cpu: 1
              memory: 1G
```

```
$ kubectl get raycluster raycluster-kuberay -o yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ray.io/v1","kind":"RayCluster","metadata":{"annotations":{},"name":"raycluster-kuberay","namespace":"default"},"spec":{"headGroupSpec":{"rayStartParams":{},"template":{"spec":{"containers":[{"image":"rayproject/ray:2.46.0","name":"ray-head","ports":[{"containerPort":6379,"name":"gcs-server"},{"containerPort":8265,"name":"dashboard"},{"containerPort":10001,"name":"client"}],"resources":{"limits":{"cpu":1,"memory":"2G"},"requests":{"cpu":1,"memory":"2G"}}}],"schedulerName":"default-scheduler"}}},"rayVersion":"2.46.0","workerGroupSpecs":[{"groupName":"workergroup","minReplicas":3,"numOfHosts":4,"rayStartParams":{},"replicas":1,"template":{"spec":{"containers":[{"image":"rayproject/ray:2.46.0","name":"ray-worker","resources":{"limits":{"cpu":1,"memory":"1G"},"requests":{"cpu":1,"memory":"1G"}}}]}}}]}}
  creationTimestamp: "2025-10-28T16:05:15Z"
  generation: 1
  name: raycluster-kuberay
  namespace: default
  resourceVersion: "13168"
  uid: c83e5bd6-c5ee-4a22-b615-341fee75725a
....
status:
  availableWorkerReplicas: 12
  conditions:
  - lastTransitionTime: "2025-10-28T16:05:36Z"
    message: ""
    reason: HeadPodRunningAndReady
    status: "True"
    type: HeadPodReady
  - lastTransitionTime: "2025-10-28T16:06:43Z"
    message: All Ray Pods are ready for the first time
    reason: AllPodRunningAndReadyFirstTime
    status: "True"
    type: RayClusterProvisioned
  - lastTransitionTime: "2025-10-28T16:05:36Z"
    message: ""
    reason: RayClusterSuspended
    status: "False"
    type: RayClusterSuspended
  - lastTransitionTime: "2025-10-28T16:05:36Z"
    message: ""
    reason: RayClusterSuspending
    status: "False"
    type: RayClusterSuspending
  desiredCPU: "5"
  desiredGPU: "0"
  desiredMemory: 6G
  desiredTPU: "0"
  desiredWorkerReplicas: 12
  endpoints:
    client: "10001"
    dashboard: "8265"
    gcs-server: "6379"
    metrics: "8080"
  head:
    podIP: 10.244.0.34
    podName: raycluster-kuberay-head-g5wzm
    serviceIP: 10.244.0.34
    serviceName: raycluster-kuberay-head-svc
  lastUpdateTime: "2025-10-28T16:06:43Z"
  maxWorkerReplicas: -4                <---- overflowed
  minWorkerReplicas: 12
  observedGeneration: 1
  readyWorkerReplicas: 12
  state: ready
  stateTransitionTimes:
    ready: "2025-10-28T16:06:43Z"
```

### Anything else

_No response_

### Are you willing to submit a PR?

- [ ] Yes I am willing to submit a PR!

	// CalculateMaxReplicas calculates max worker replicas at the cluster level
	func CalculateMaxReplicas(cluster *rayv1.RayCluster) int32 {
	count := int32(0)
	for _, nodeGroup := range cluster.Spec.WorkerGroupSpecs {
	if nodeGroup.Suspend != nil && *nodeGroup.Suspend {
	continue
	}
	count += (nodeGroup.MaxReplicas nodeGroup.NumOfHosts)
	}

	return count
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] `Status.MaxWorkerReplicas` overflow when numOfHosts > 1 #4153

Search before asking

KubeRay Component

What happened + What you expected to happen

Reproduction script

Anything else

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	// MaxReplicas denotes the maximum number of desired Pods for this worker group, and the default value is maxInt32.
	// +kubebuilder:default:=2147483647
	MaxReplicas *int32 `json:"maxReplicas"`

[Bug] Status.MaxWorkerReplicas overflow when numOfHosts > 1 #4153

Description

Search before asking

KubeRay Component

What happened + What you expected to happen

Reproduction script

Anything else

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] `Status.MaxWorkerReplicas` overflow when numOfHosts > 1 #4153