Redis HA Needs `publishNotReadyAddresses`

Checklist:

- [x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- [x] I've included steps to reproduce the bug.
- [x] I've pasted the output of `argocd version`.

**Describe the bug**

In the HA deployment bundle (`manifests/ha/cluster-install?ref=v3.1.8`), the headless Service `argocd-redis-ha` does not set `publishNotReadyAddresses: true`. During a cold restart of the Redis HA StatefulSet, kube-dns removes the service endpoints while pods are not ready, so `fix-split-brain.sh` cannot contact Sentinel. Every redis pod keeps the stale `slaveof` entry and fails its startup probe (`role=slave; repl=connect`), leading to a permanent `CrashLoopBackOff` and broken Argo CD logins (`rpc error: code = Unauthenticated desc = no session information`).

**To Reproduce**


1. Deploy the stock HA bundle (`kustomize build github.com/argoproj/argo-cd/manifests/ha/cluster-install?ref=v3.1.8 | kubectl apply -f -`).
2. Force a restart: `kubectl delete pod argocd-redis-ha-server-{0,1,2} -n argocd`.
3. Watch the pods restart:
   - `kubectl logs argocd-redis-ha-server-1 -c split-brain-fix` shows `Could not connect to Redis at argocd-redis-ha:26379: Name does not resolve`.
   - `kubectl get pods -n argocd` shows all redis pods in `CrashLoopBackOff`.
4. Attempt UI/CLI login; Argo CD server logs report “no session information”.

**Expected behavior**


Sentinel should be reachable via the service name during bootstrap so a new master can be elected and replicas join without crashing.

**Screenshots**


```
❯ k get pods
NAME                                                READY   STATUS             RESTARTS         AGE
argocd-application-controller-0                     1/1     Running            0                4h59m
argocd-application-controller-1                     1/1     Running            0                4h57m
argocd-applicationset-controller-5c6799bb45-pkn85   1/1     Running            0                4h58m
argocd-applicationset-controller-5c6799bb45-wdmgq   1/1     Running            0                35m
argocd-dex-server-57c896dbc6-jwl8x                  1/1     Running            0                4h59m
argocd-notifications-controller-84b4d4b674-jphfz    1/1     Running            0                4h59m
argocd-redis-ha-haproxy-58476dd6d7-69pdw            1/1     Running            0                36m
argocd-redis-ha-haproxy-58476dd6d7-cdbrs            1/1     Running            0                36m
argocd-redis-ha-haproxy-58476dd6d7-x69ht            1/1     Running            0                35m
argocd-redis-ha-server-0                            2/3     CrashLoopBackOff   98 (4m48s ago)   5h
argocd-redis-ha-server-1                            2/3     CrashLoopBackOff   100 (71s ago)    4h58m
argocd-redis-ha-server-2                            2/3     CrashLoopBackOff   101 (48s ago)    4h57m
argocd-repo-server-65669fd7cf-5gq4r                 1/1     Running            0                15m
argocd-repo-server-65669fd7cf-l22cl                 1/1     Running            0                15m
argocd-server-c67bd8d45-9xxzq                       1/1     Running            0                35m
argocd-server-c67bd8d45-f4jmx                       1/1     Running            0                4h57m
argocd-server-c67bd8d45-qcjsj                       1/1     Running            0                4h59m
```

**Version**

```shell
Paste the output from `argocd version` here.
argocd: v3.1.9+8665140
  BuildDate: 2025-10-17T23:03:44Z
  GitCommit: 8665140f96f6b238a20e578dba7f9aef91ddac51
  GitTreeState: clean
  GoVersion: go1.25.3
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v3.1.8+becb020
```


**Logs**

```
$ kubectl logs argocd-redis-ha-server-1 -c split-brain-fix -n argocd
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again
  Fri Oct 24 05:21:53 UTC 2025 Did not find redis master ()
Identifying redis master (get-master-addr-by-name)..
  using sentinel (argocd-redis-ha), sentinel group name (argocd)
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again
  Fri Oct 24 05:24:08 UTC 2025 Did not find redis master ()
...
Attempting to force failover (sentinel failover)..
  on sentinel (argocd-redis-ha:26379), sentinel grp (argocd)
  Fri Oct 24 05:29:53 UTC 2025 Failover returned with 'NOGOODSLAVE'
```

```
$ kubectl logs argocd-redis-ha-server-1 -c redis -n argocd --previous
1:S 24 Oct 2025 04:50:55.688 * Connecting to MASTER 172.20.218.81:6379
1:S 24 Oct 2025 04:50:55.688 * MASTER <-> REPLICA sync started
1:S 24 Oct 2025 04:50:55.689 * Master replied to PING, replication can continue...
1:S 24 Oct 2025 04:50:55.690 * Trying a partial resynchronization (request 4306f2b885fb2bed9c6ea95e1e3069a81caf0b4d:429491749).
1:S 24 Oct 2025 04:50:55.690 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 24 Oct 2025 04:50:56.691 * Connecting to MASTER 172.20.218.81:6379
1:S 24 Oct 2025 04:50:56.692 * MASTER <-> REPLICA sync started
1:S 24 Oct 2025 04:50:56.692 * Master replied to PING, replication can continue...
1:S 24 Oct 2025 04:50:56.693 * Trying a partial resynchronization (request 4306f2b885fb2bed9c6ea95e1e3069a81caf0b4d:429491749).
1:S 24 Oct 2025 04:50:56.693 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
```

```
$ kubectl logs argocd-server-c67bd8d45-f4jmx -n argocd | rg "Unauthenticated"
{"grpc.code":"Unauthenticated","grpc.component":"server","grpc.error":"rpc error: code = Unauthenticated desc = invalid session: failed to verify the token","grpc.method":"List","grpc.method_type":"unary","grpc.service":"cluster.ClusterService","grpc.start_time":"2025-10-24T04:47:57Z","grpc.time_ms":"0.646","level":"info","msg":"finished call","peer.address":"[::1]:54768","protocol":"grpc","time":"2025-10-24T04:47:57Z"}
{"grpc.code":"Unauthenticated","grpc.component":"server","grpc.error":"rpc error: code = Unauthenticated desc = no session information","grpc.method":"List","grpc.method_type":"unary","grpc.service":"application.ApplicationService","grpc.start_time":"2025-10-24T04:49:22Z","grpc.time_ms":"0.187","level":"info","msg":"finished call","peer.address":"[::1]:54768","protocol":"grpc","time":"2025-10-24T04:49:22Z"}
```

**Workaround**

Apply a service patch:

```yaml
spec:
  publishNotReadyAddresses: true
```
then restart the StatefulSet. After applying this the cluster recovers.


**References**

- Kubernetes docs: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-notready-addresses
- Bitnami redis chart (ships the flag): https://github.com/bitnami/charts/blob/main/bitnami/redis/templates/headless-svc.yaml#L24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Redis HA Needs `publishNotReadyAddresses` #25060

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Redis HA Needs publishNotReadyAddresses #25060

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Redis HA Needs `publishNotReadyAddresses` #25060