Skip to content

Commit f4bd640

Browse files
committed
First draft of DNS Publishing RFC
Signed-off-by: Phil Brookes <[email protected]>
1 parent 3c07ad8 commit f4bd640

File tree

2 files changed

+70
-0
lines changed

2 files changed

+70
-0
lines changed
37.4 KB
Loading
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
## Why
2+
Currently there is no way to instruct the DNS Operator to publish or unpublish a DNS Record when certain conditions are met, this means that gracefully removing a DNS Record requires manual intervention and an understanding of the internal workings of the DNS-Operator.
3+
4+
### Some Example Use Cases
5+
- DNS Failover (rapidly switch to alternative DNS Configuration) to a secondary site
6+
- Workload migration (removing workload from one cluster in favour of a new cluster)
7+
- Extra clusters during periods of high load
8+
9+
## What
10+
Add an optional publishStrategy to the dns policy CRD, which will allow an administrator to define a some rules which when met will instruct the DNS Operator to publish/unpublish the records from the zone and set a condition in the status.
11+
12+
## How
13+
14+
### Diagram
15+
![Diagram of DNS Publish strategy](./0013-dns-publishing-strategy-assets/diagram.png)
16+
17+
https://miro.com/app/board/uXjVL32kOMY=/
18+
19+
### Kuadrant operator changes
20+
The DNS Policy and DNS Record CRDs will have a new field added to their spec:
21+
22+
```yaml
23+
publishStrategy:
24+
rule: <implemented using Common Expression Language>
25+
```
26+
27+
This is read by the kuadrant-operator and propagated into any relevant DNS Records.
28+
29+
When the DNS Operator acts on these instructions it will set a condition in the DNS Record, this condition will be propagated back into the relevant DNS Policy.
30+
31+
### DNS Operator Changes
32+
The DNS Operator will read the publishStrategy from the DNS Record on reconcile, based on the values it will then interrogate the zone values to see if the publish rule is met. If so it will publish the records, if not it will ensure the records are unpublished and update the condition in the DNS Record status to reflect the decision.
33+
34+
### Available CEL Fields
35+
36+
To begin with a few basic fields will be made available, although this could potentially expand in the future.
37+
- records: The set of related records in the zone
38+
- unowned_record_count: The number of related records in the zone with an owner ID that is not the local owner ID
39+
- unhealthy_record_count: The number of related records flagged as unhealthy
40+
41+
## Use cases expanded
42+
### DNS Failover
43+
To enact DNS Failover with this config, the rule for publishing could be set to "when all other records are marked as unhealthy".
44+
45+
#### Example
46+
Cluster 1 has no publishing rule (i.e. always)
47+
Cluster 2 publishing rule is: "unhealthy_record_count >= unowned_record_count || unowned_record_count == 0"
48+
49+
- Cluster 1 is currently published and healthy and cluster 2 has no published records.
50+
- An event occurs that causes the workload to begin malfunctioning on cluster 1.
51+
- All the records for cluster 1 are marked as unhealthy in the registry (but not removed as they are the only records available)
52+
- cluster 2 reconciles and sees that all the records currently in the zone are unhealthy, as this satisfies it's publishing rule, it publishes it's records
53+
- cluster 1 reconciles and sees there are records other than it's own and so unpublishes it's own records due to being unhealthy
54+
- eventually cluster 1 is healthy again and publishes it's records
55+
- cluster 2 sees that it's rule is no longer satisfied and so unpublishes it's own records.
56+
57+
### Workload migration
58+
Cluster 1 has a workload that needs to be migrated to cluster 2.
59+
- publishing rule on cluster 1 is set to: "unowned_record_count = 0"
60+
- workload is created on cluster 2
61+
- records created by cluster 2
62+
- cluster 1 sees other records exist and unpublishes it's records from the zone
63+
- admin sees the status updated on the DNS Policy in cluster 1 (all records removed from zone) happened more than the TTL ago
64+
- admin can safely remove the workload from cluster 1.
65+
66+
### Extra clusters during high load
67+
This would require the addition of some metrics into the CEL rules that are not currently planned, but this can show how that rule might look.
68+
69+
- Cluster 1 has the workload and publishes always
70+
- Cluster 2 has the workload and has a publishing rule: "requests_per_minute >= n"

0 commit comments

Comments
 (0)