Skip to content

Conversation

@tsivaprasad
Copy link
Contributor

@tsivaprasad tsivaprasad commented Dec 3, 2025

Summary

This PR implements automatic etcd mode reconfiguration and adds comprehensive documentation for changing a Control Plane host's etcd mode between server and client after initialization.

Changes

Automatic Reconfiguration

  • Mode Detection: Automatically detects when PGEDGE_ETCD_MODE environment variable changes
  • Server → Client: Safely demotes etcd server to client mode, removing from cluster membership
  • Client → Server: Promotes client to server mode, joining the etcd cluster with proper credentials
  • Automatic Rejoin: HTTP-based fallback mechanism discovers cluster members and automatically rejoins if credentials are invalid

Testing

Manually validated:

  • Promoting a client host to server mode
1. Created a cluster with 6 nodes (3 etcd servers, 3 etcd clients) and verified that all nodes come up healthy. verified that the servers are listed correctly in the etcd cluster.
+------------------+---------+--------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |  NAME  |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+--------+---------------------------+---------------------------+------------+
| b71f75320dc06a6c | started | host-1 | https://192.168.64.2:2380 | https://192.168.64.2:2379 |      false |
| bd0f3ed267db8129 | started | host-3 | https://192.168.64.2:2580 | https://192.168.64.2:2579 |      false |
| e4dc9da36c7e7665 | started | host-2 | https://192.168.64.2:2480 | https://192.168.64.2:2479 |      false |
+------------------+---------+--------+---------------------------+---------------------------+------------+
2. Modified the compose file with host-4 as server with different ports
3. using command WORKSPACE_DIR=/Users/sivat/projects/control-plane/control-plane docker compose -f ./docker/control-plane-dev/docker-compose.yaml up -d host-4
4. Verified that host-4 comes up healthy and joins the etcd cluster
+------------------+---------+--------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |  NAME  |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+--------+---------------------------+---------------------------+------------+
| b71f75320dc06a6c | started | host-1 | https://192.168.64.2:2380 | https://192.168.64.2:2379 |      false |
| bd0f3ed267db8129 | started | host-3 | https://192.168.64.2:2580 | https://192.168.64.2:2579 |      false |
| c89ea0e1435b2a84 | started | host-4 | https://192.168.64.2:2680 | https://192.168.64.2:2679 |      false |
| e4dc9da36c7e7665 | started | host-2 | https://192.168.64.2:2480 | https://192.168.64.2:2479 |      false |
+------------------+---------+--------+---------------------------+---------------------------+------------+

Respective Logs:

host-4-1  | 5:27PM INF got shutdown signal
host-4-1  | 5:27PM INF attempting to gracefully shut down
host-4-1  | 5:27PM INF shutting down scheduler service component=scheduler_service
host-4-1  | 5:27PM INF closing scheduled job watch component=scheduler_service
host-4-1  | 5:27PM INF shutting down instance monitors
host-4-1 exited with code 0
host-4-1  | 5:27PM INF checking etcd mode for reconfiguration modes_equal=false new_mode=server old_mode=client old_mode_empty=false
host-4-1  | 5:27PM INF detected etcd_mode change, performing reconfiguration host_id=host-4 new_mode=server old_mode=client
host-4-1  | 5:27PM INF starting client->server reconfiguration host_id=host-4
host-4-1  | 5:27PM INF starting remote etcd client with existing credentials
host-4-1  | 5:27PM INF remote etcd client started, querying cluster information
host-4-1  | 5:27PM INF constructed HTTP endpoint from host data host_id=host-1 http_endpoint=http://192.168.64.2:3000
host-4-1  | 5:27PM INF constructed HTTP endpoint from host data host_id=host-2 http_endpoint=http://192.168.64.2:3001
host-4-1  | 5:27PM INF constructed HTTP endpoint from host data host_id=host-3 http_endpoint=http://192.168.64.2:3002
host-4-1  | 5:27PM INF constructed HTTP endpoint from host data host_id=host-5 http_endpoint=http://192.168.64.2:3004
host-4-1  | 5:27PM INF constructed HTTP endpoint from host data host_id=host-6 http_endpoint=http://192.168.64.2:3005
host-4-1  | 5:27PM INF requesting server credentials via AddHost
host-4-1  | 5:27PM WRN failed to create server credentials - attempting automatic rejoin via HTTP error="failed to create cert for etcd host user: failed to fetch principal: failed to get \"/principals/host:host-4:etcd-user\": etcdserver: authentication failed, invalid user ID or password"
host-4-1  | 5:27PM INF attempting automatic rejoin via HTTP endpoints endpoint_count=5 host_id=host-4
host-4-1  | 5:27PM INF trying to get credentials from cluster member endpoint=http://192.168.64.2:3000
host-1-1  | 5:27PM INF http request component=api_server duration=27084 duration_ms=0 http.client_ip=192.168.64.2 http.method=GET http.response.content_length=157 http.status_code=200 http.url=/v1/cluster/join-token http.useragent=Go-http-client/1.1
host-1-1  | 5:27PM INF http request component=api_server duration=60300375 duration_ms=60 http.client_ip=192.168.64.2 http.method=POST http.response.content_length=3685 http.status_code=200 http.url=/v1/internal/cluster/join-options http.useragent=Go-http-client/1.1
host-4-1  | 5:27PM INF received credentials via HTTP - joining cluster as embedded etcd leader=host-1
host-4-1  | 5:27PM INF etcd started as learner
host-4-1  | 5:27PM INF attempting to promote from learner to voting cluster member
host-4-1  | 5:27PM INF promotion successful
host-4-1  | 5:27PM INF waiting for cluster to be healthy
host-4-1  | 5:27PM INF successfully joined cluster as embedded etcd via automatic rejoin
host-4-1  | 5:27PM INF automatic rejoin successful - joined cluster as embedded etcd server
host-4-1  | 5:27PM INF starting scheduler service component=scheduler_service
host-4-1  | 5:27PM INF starting http server component=api_server host_port=0.0.0.0:3003
  • Demoting a server host to client mode
1. Rolled back the changes by bringing down host-4 and modifying the compose file to set host-4 as client
2. using command WORKSPACE_DIR=/Users/sivat/projects/control-plane/control-plane docker compose -f ./docker/control-plane-dev/docker-compose.yaml up -d host-4
3. Verified that host-4 comesup healthy and its in client mode
+------------------+---------+--------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |  NAME  |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+--------+---------------------------+---------------------------+------------+
| b71f75320dc06a6c | started | host-1 | https://192.168.64.2:2380 | https://192.168.64.2:2379 |      false |
| bd0f3ed267db8129 | started | host-3 | https://192.168.64.2:2580 | https://192.168.64.2:2579 |      false |
| e4dc9da36c7e7665 | started | host-2 | https://192.168.64.2:2480 | https://192.168.64.2:2479 |      false |
+------------------+---------+--------+---------------------------+---------------------------+------------+

Respective Logs:

host-4-1  | 5:28PM INF got shutdown signal
host-4-1  | 5:28PM INF attempting to gracefully shut down
host-4-1  | 5:28PM INF shutting down scheduler service component=scheduler_service
host-4-1  | 5:28PM INF closing scheduled job watch component=scheduler_service
host-4-1  | 5:28PM INF shutting down instance monitors
host-4-1 exited with code 0
host-4-1  | 5:28PM INF checking etcd mode for reconfiguration modes_equal=false new_mode=client old_mode=server old_mode_empty=false
host-4-1  | 5:28PM INF detected etcd_mode change, performing reconfiguration host_id=host-4 new_mode=client old_mode=server
host-4-1  | 5:28PM INF starting server->client reconfiguration
host-4-1  | 5:28PM INF getting cluster member list to find remote endpoints
host-4-1  | 5:28PM INF removing this host from etcd cluster
host-4-1  | 5:28PM INF removing etcd data directory for server->client transition etcd_data_dir=/Users/sivat/projects/control-plane/control-plane/docker/control-plane-dev/data/host-4/etcd
host-4-1  | 5:28PM INF completed etcd server->client transition; using remaining cluster members as remote endpoints endpoints=["https://192.168.64.2:2379","https://192.168.64.2:2579","https://192.168.64.2:2479"]
host-4-1  | 5:28PM INF starting scheduler service component=scheduler_service
host-4-1  | 5:28PM INF starting http server component=api_server host_port=0.0.0.0:3003

Checklist

  • Documentation updated docs/using/etcd-reconfiguration.md

PLAT-315

@tsivaprasad tsivaprasad requested a review from Copilot December 3, 2025 17:47
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive documentation for reconfiguring etcd mode on Control Plane hosts, enabling operators to safely change hosts between server and client modes after initial deployment. The documentation includes detailed procedures, troubleshooting guidance, and best practices based on manual testing in a development environment.

Key Changes

  • Added complete step-by-step procedures for promoting clients to servers and demoting servers to clients
  • Documented environment variable configuration requirements and container restart workflows
  • Fixed a bug in rbac.go to ensure EtcdMode is persisted when writing host credentials

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
docs/using/etcd-reconfiguration.md New documentation file covering etcd mode reconfiguration procedures, prerequisites, troubleshooting, and best practices
server/internal/etcd/rbac.go Added missing EtcdMode assignment to ensure mode is preserved in generated config

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tsivaprasad tsivaprasad marked this pull request as draft December 4, 2025 14:44
@tsivaprasad tsivaprasad force-pushed the PLAT-315-ability-to-reconfigure-a-hosts-etcd-configuration branch 3 times, most recently from 2ce1296 to 6b6efb0 Compare January 1, 2026 17:23
@tsivaprasad tsivaprasad force-pushed the PLAT-315-ability-to-reconfigure-a-hosts-etcd-configuration branch from 6b6efb0 to 4024167 Compare January 1, 2026 17:25
@tsivaprasad tsivaprasad marked this pull request as ready for review January 1, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants