feat: Multi StackConfigPolicy per cluster #8793

tehbooom · 2025-08-18T18:45:14Z

This PR allows for a cluster to have multiple StackConfigPolicies (SCPs). Avoids conflicts of the same resources based on the weights of the policy with the lowest weight having the highest priority. If two policies have the same weight and the same resources it will fail and the phase will be changed to CONFLICT.

Here is a simple table showing what should pass and what should fail

Weight	Resource	Result
Same	Different	✅
Same	Same	❌
Different	Same	✅
Different	Different	✅

The following was tested and passed interchangeably.

---
# different weights same policy should pass with policy-1 winning
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: cluster-policy-1
spec:
  weight: -1
  elasticsearch:
    snapshotRepositories:
      policy-1-backups:
        type: s3
        settings:
          bucket: "policy-1-backups"
          region: "us-west-2"
---
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: cluster-policy-2
spec:
  weight: 11
  elasticsearch:
    snapshotRepositories:
      policy-1-backups:
        type: s3
        settings:
          bucket: "policy-2-backups"
          region: "us-east-1"
---
## Different weights different policies should pass
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: cluster-policy-1
spec:
  weight: 10
  elasticsearch:
    snapshotRepositories:
      policy-1-backups:
        type: s3
        settings:
          bucket: "policy-1-backups"
          region: "us-west-2"
---
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: cluster-policy-2
spec:
  weight: 11
  elasticsearch:
    snapshotRepositories:
      policy-2-backups:
        type: s3
        settings:
          bucket: "policy-2-backups"
          region: "us-east-1"
---
## Same weight different repositories should pass
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: cluster-policy-1
spec:
  weight: 10
  elasticsearch:
    snapshotRepositories:
      policy-1-backups:
        type: s3
        settings:
          bucket: "policy-1-backups"
          region: "us-west-2"
---
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: cluster-policy-2
spec:
  weight: 10
  elasticsearch:
    snapshotRepositories:
      policy-2-backups:
        type: s3
        settings:
          bucket: "policy-3-backups"
          region: "us-east-1"
---
# Policy 2 wins with telemetry.optIn set to true
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: kibana-policy-1
spec:
  weight: 1
  kibana:
    config:
      telemetry.optIn: false
---
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: kibana-policy-2
spec:
  weight: 0
  kibana:
    config:
      telemetry.optIn: true

The following will fail

---
# Same weight same policy must fail
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: cluster-policy-1
spec:
  weight: 10
  elasticsearch:
    snapshotRepositories:
      policy-1-backups:
        type: s3
        settings:
          bucket: "policy-1-backups"
          region: "us-west-2"
---
apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: cluster-policy-2
spec:
  weight: 10
  elasticsearch:
    snapshotRepositories:
      policy-1-backups:
        type: s3
        settings:
          bucket: "policy-2-backups"
          region: "us-east-1"

This was also tested with a ECK 3.1.0 then upgraded. Previous SCPs will have a default value of 0.

…e values are allowed

prodsecmachine · 2025-08-18T18:45:27Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Licenses	0	0	0	0	0 issues
✅	Open Source Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

github-actions · 2025-08-18T18:46:04Z

🔍 Preview links for changed docs

docs/reference/api-reference/main.md

Copilot

Pull Request Overview

This PR implements support for multiple StackConfigPolicies (SCPs) per cluster by introducing a weight-based priority system. Policies with lower weights take precedence over those with higher weights, and conflicts are detected when policies with the same weight attempt to configure the same resources.

Adds a weight field to StackConfigPolicy specs with default value 0
Implements conflict detection for policies with same weights targeting overlapping resources
Merges configurations from multiple policies based on weight priorities

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pkg/apis/stackconfigpolicy/v1alpha1/stackconfigpolicy_types.go	Adds weight field to StackConfigPolicySpec
pkg/controller/stackconfigpolicy/weight_conflict_test.go	Comprehensive test suite for weight conflict detection logic
pkg/controller/stackconfigpolicy/controller.go	Core multi-policy support with conflict detection and policy merging
pkg/controller/stackconfigpolicy/elasticsearch_config_settings.go	Multi-policy versions of Elasticsearch configuration functions
pkg/controller/stackconfigpolicy/kibana_config_settings.go	Multi-policy versions of Kibana configuration functions
pkg/controller/elasticsearch/filesettings/secret.go	Multi-policy secret management with policy reference tracking
pkg/controller/elasticsearch/filesettings/file_settings.go	Policy merging logic for Elasticsearch file settings
pkg/controller/stackconfigpolicy/controller_test.go	Updates test expectations for multi-policy behavior
config/crds/v1/*.yaml	CRD updates to include weight field
docs/reference/api-reference/main.md	Documentation for new weight field

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pkg/controller/stackconfigpolicy/elasticsearch_config_settings.go

pkg/controller/stackconfigpolicy/kibana_config_settings.go

pkg/controller/elasticsearch/filesettings/file_settings.go

pkg/controller/stackconfigpolicy/elasticsearch_config_settings.go

pkg/controller/stackconfigpolicy/kibana_config_settings.go

Copilot · 2025-08-22T08:14:41Z

pkg/controller/elasticsearch/filesettings/file_settings.go

+// For map-type values (like snapshot repositories), individual entries are merged rather than replaced
+func (s *Settings) mergeConfig(target, source *commonv1.Config) {
+	if source == nil || source.Data == nil {
+		return


The mergeConfig function modifies the target parameter but then creates a new Config object that is not assigned back to the original parameter. The line target = &commonv1.Config{Data: make(map[string]interface{})} only modifies the local variable, not the original pointer. This will cause the merge to fail silently.

Suggested change

return

func (s *Settings) mergeConfig(target, source *commonv1.Config) *commonv1.Config {

if source == nil || source.Data == nil {

return target

I think this is a valid copilot observation. Also I would expect unit tests for this kind of functionality. We do have more advanced merging through the CanonicalConfig abstraction in the code base. But there might be some gotchas with the more complex configuration objects that stack config policies support that might not be thinking of right now.

thbkrkr

Looks great, thank you! I like how this extends the original implementation to multiple SCPs without adding complexity. I didn’t do any testing on my side, but code-wise this looks good. 👍

pkg/controller/elasticsearch/filesettings/secret.go

pebrc · 2025-09-19T13:03:21Z

buildkite test this -f p=gke,t=TestStackConfigPolicy*

pebrc

I am generally positive about the approach but I feel like this needs more work, and test coverage. I did a quick smoke test and it does not work as advertised.

pebrc · 2025-09-19T13:21:52Z

test/e2e/es/stackconfigpolicy_test.go

+				},
+				ClusterSettings: &commonv1.Config{
+					Data: map[string]interface{}{
+						"persistent": map[string]interface{}{


this leads to an illegal setting "persistent"/"transient" are not supported via stack config policies

pebrc · 2025-09-19T13:32:14Z

pkg/controller/elasticsearch/filesettings/file_settings.go


+// mergeConfig merges source config into target config, with source taking precedence
+// For map-type values (like snapshot repositories), individual entries are merged rather than replaced
+func (s *Settings) mergeConfig(target, source *commonv1.Config) {


Why is this a receiver method of Settings?

pebrc · 2025-09-19T13:40:51Z

pkg/controller/elasticsearch/filesettings/file_settings.go

+	for key, value := range source.Data {
+		if targetValue, exists := target.Data[key]; exists {
+			if targetMap, targetIsMap := targetValue.(map[string]interface{}); targetIsMap {
+				if sourceMap, sourceIsMap := value.(map[string]interface{}); sourceIsMap {
+					for subKey, subValue := range sourceMap {
+						targetMap[subKey] = subValue
+					}
+					continue
+				}
+			}
+		}


This seems too simplistic given that for example cluster settings can both be in nested and flat syntax. Also it only merges one level deep?

pebrc · 2025-09-19T13:44:56Z

pkg/controller/elasticsearch/filesettings/file_settings.go

+// For map-type values (like snapshot repositories), individual entries are merged rather than replaced
+func (s *Settings) mergeConfig(target, source *commonv1.Config) {
+	if source == nil || source.Data == nil {
+		return


I think this is a valid copilot observation. Also I would expect unit tests for this kind of functionality. We do have more advanced merging through the CanonicalConfig abstraction in the code base. But there might be some gotchas with the more complex configuration objects that stack config policies support that might not be thinking of right now.

tehbooom · 2025-10-16T12:52:50Z

buildkite test this -f p=gke,t=TestStackConfigPolicy*

pebrc

I started running tests. It is is a fairly complex implementation so I am not finished with my testing, but wanted to leave feedback already.

I have two main concerns:

secret garbage collection is not working/not implementing and we are leaking secrets. This can be adressed.
the reconcilation logic has potential concurrently issues: in controller-runtime the workqueue guarantees that the same reconciled resource is not being reconcilied concurrently by multiple go-routines. This implemenation turns the logic upside down. We reconcile one policy based on an event (create/update/delete) but then in the same iteration we actually reconcile all policies that affect the same Elasticsearch/Kibana. That means that now multiple go-routines can potentially reconcile the configuration for the same Elasticsearch/Kibana clusters at the same time. I have not tested this in detail and maybe K8s optimistic concurrency control helps us a bit here. But I don't have a good remedy for this problem at the moment.

pebrc · 2025-11-01T09:56:47Z

pkg/controller/stackconfigpolicy/controller.go

+}
+
+// policyIsEmpty checks if a policy has no meaningful configuration
+func (r *ReconcileStackConfigPolicy) policyIsEmpty(policy *policyv1alpha1.StackConfigPolicy) bool {


Could be a receiver method on the policy itself:

// HasNoSettings checks if the policy has no meaningful settings. // SecureSettings are not considered in this check as they are composable with other policies. func (s StackConfigPolicySpec) HasNoSettings() bool { return s.Elasticsearch.ClusterSettings.IsEmpty() && s.Elasticsearch.SnapshotRepositories.IsEmpty() && s.Elasticsearch.SnapshotLifecyclePolicies.IsEmpty() && s.Elasticsearch.SecurityRoleMappings.IsEmpty() && s.Elasticsearch.IndexLifecyclePolicies.IsEmpty() && s.Elasticsearch.IngestPipelines.IsEmpty() && s.Elasticsearch.IndexTemplates.ComponentTemplates.IsEmpty() && s.Elasticsearch.IndexTemplates.ComposableIndexTemplates.IsEmpty() && s.Elasticsearch.Config.IsEmpty() && len(s.Elasticsearch.SecretMounts) == 0 && s.Kibana.Config.IsEmpty() }

pebrc · 2025-11-01T09:58:43Z

pkg/controller/stackconfigpolicy/controller.go

+
+	// Check if there are any common keys
+	for key := range config1.Data {
+		if _, exists := config2.Data[key]; exists {


I wonder if this first level comparison catches all cases. Especially if there is a mix of nested and flat syntax.

pebrc · 2025-11-01T09:59:22Z

pkg/controller/stackconfigpolicy/controller.go

+}
+
+// configsConflict checks if two Config objects have overlapping keys
+func (r *ReconcileStackConfigPolicy) configsConflict(config1, config2 *commonv1.Config) bool {


This method does not need to be a receiver method of ReconcileStackConfigPolicy

pebrc · 2025-11-01T10:02:37Z

pkg/controller/stackconfigpolicy/controller.go

+}
+
+// policiesCouldOverlap checks if two policies could potentially target the same resources
+func (r *ReconcileStackConfigPolicy) policiesCouldOverlap(policy1, policy2 *policyv1alpha1.StackConfigPolicy, policy1Selector labels.Selector) bool {


Bit awkward to have the policy selector passed in here. I I think would probably make these receiver methods of StackConfigPolicy instead.

pebrc · 2025-11-03T15:54:26Z

pkg/controller/elasticsearch/filesettings/secret.go

+}
+
+// NewSettingsSecretFromPolicies returns a new SettingsSecret for a given Elasticsearch from multiple StackConfigPolicies.
+func NewSettingsSecretFromPolicies(version int64, es types.NamespacedName, currentSecret *corev1.Secret, policies []policyv1alpha1.StackConfigPolicy, meta metadata.Metadata) (corev1.Secret, int64, error) {


This seems to be only called from NewSettingsSecretWithVersionFromPolicies I totally get why we want a second function where we can pass the version for testing but I don't see any tests for this and it does not have to be exported.

pebrc · 2025-11-04T14:14:46Z

pkg/controller/stackconfigpolicy/controller.go

 	}

+	// check for weight conflicts with other policies
+	if err := r.checkWeightConflicts(ctx, &policy); err != nil {


Should this be part of the admission webhook validation checks already to give immediate feedback to users?

pebrc · 2025-11-04T14:29:31Z

pkg/controller/stackconfigpolicy/elasticsearch_config_settings.go


-func newElasticsearchConfigSecret(policy policyv1alpha1.StackConfigPolicy, es esv1.Elasticsearch) (corev1.Secret, error) {
+// reconcileSecretMounts creates the secrets in SecretMounts to the respective Elasticsearch namespace where they should be mounted to.
+func reconcileSecretMounts(ctx context.Context, c k8s.Client, es esv1.Elasticsearch, policy *policyv1alpha1.StackConfigPolicy, meta metadata.Metadata) error {


Unused code?

pebrc · 2025-11-04T14:38:31Z

pkg/controller/stackconfigpolicy/elasticsearch_config_settings.go

+// reconcileSecretMountsFromPolicies creates secrets from all policies' SecretMounts
+func (r *ReconcileStackConfigPolicy) reconcileSecretMountsFromPolicies(ctx context.Context, es esv1.Elasticsearch, policies []policyv1alpha1.StackConfigPolicy, meta metadata.Metadata) error {
+	// Collect all unique secret mounts from all policies
+	secretMountMap := make(map[string]policyv1alpha1.SecretMount) // key is secretName to avoid duplicates


Weights are ignored if I am not mistaken?

pebrc · 2025-11-04T14:39:53Z

pkg/controller/stackconfigpolicy/elasticsearch_config_settings.go

+		}
+	}
+
+	for _, secretMount := range secretMountMap {


We just iterated through all policies to find the mounts why do now have to search through them again to find the policy? Why not keep the source policy around in the map of mount points to begin with.

pebrc · 2025-11-04T14:50:42Z

pkg/controller/stackconfigpolicy/controller.go

+			continue
+		case 1:
+			// Single policy - use the original approach for backward compatibility
+			expectedSecret, expectedVersion, err = filesettings.NewSettingsSecretWithVersion(esNsn, &actualSettingsSecret, &allPolicies[0], meta)


I am not sure I see the argument for keepin the old approach around. Can you explain why we need to be backwards compatible here?

My expectation would be that we migrate existing secrets to the new scheme, so we might need some code that updates/removes soft owner references and applies the new tracking we need for garbage collection instead.

pkoutsovasilis · 2025-11-11T16:57:06Z

@pebrc I share the same concurrency concerns as you. I will try to explain my point of view and suggest an alternative solution.

Why the Concurrency Issues Exist

This PR's behavior: Consider we have StackConfigPolicies A, B, C, and D all targeting the same Elasticsearch cluster. When StackConfigPolicy A reconciles, it lists ALL four policies, merges them all, and writes the shared file-settings secret. Simultaneously, when policies B, C, and D each reconcile (triggered by their own watches or events), they each independently list all four policies, merge them, and all try to update the same file-settings secret.

Result: Multiple goroutines (one per StackConfigPolicy) may end up writing to the same secret, relying only on Kubernetes' optimistic concurrency control to prevent corruption. This causes unnecessary reconciliation loops (conflict → requeue → retry), wasted work (each of the 4 policies doing identical merge operations), and potential configuration drift.

My Proposal

StackConfigPolicy controllers should write intent - Each policy creates its own secret in the namespace of each Elasticsearch cluster it targets (matched by resource selector). For example, if policy A targets ES clusters "prod-es" and "dev-es", it creates prod-es-scp-namespace-policy-a in the prod-es namespace and dev-es-scp-namespace-policy-a in the dev-es namespace. Each of these secrets contains just that policy's configuration for that specific cluster. These secrets are owned by their respective Elasticsearch resources (for automatic cleanup) and labeled so they can be discovered by label selectors. Each StackConfigPolicy controller uses dynamic watches to track its own secrets across all target namespaces. Additionally, it maintains the soft-owner mechanics so these are cleaned up when the respective StackConfigPolicy is deleted

Elasticsearch controller should aggregate and apply - The ES controller watches for policy secrets via label selectors in its own namespace. When reconciling "prod-es", it lists all policy secrets in the prod-es namespace (finds secrets from policies A, B, C, and D), sorts them by weight, merges the configurations (using proper CanonicalConfig to handle nested/flat syntax as you noted), and writes the final file-settings secret for that cluster. The ES controller is the single writer to the final configuration.

I believe such an approach:

Eliminates concurrent writes (policy A writes its own secrets, policy B writes its own secrets, etc. - no shared writes; only ES controller writes final merged secret)
Solves the garbage collection issue
Provides a single merge point for proper configuration handling
Scales better (no N² policy listing - each policy only manages its own secrets)
Makes debugging easier (can see policy A's intent for prod-es in its secret, policy B's intent in its secret, etc.)

Trade-offs

This is admittedly a more radical architectural change:

Breaks the 1:1 assumption - The current codebase assumes one StackConfigPolicy maps to one file-settings secret per cluster. This changes it to N policies → N policy secrets → 1 merged file-settings secret per cluster (4 policies targeting 1 cluster → 4 policy secrets + 1 file-settings secret in that cluster's namespace).
Elasticsearch controller complexity - The ES controller needs to gain merging logic that currently lives in the StackConfigPolicy controller. This adds reconciliation complexity to the ES controller.
More secrets - Creates N × M secrets where N is the number of policies and M is the total number of ES clusters they target, plus M file-settings secrets. For example, if 10 policies each target 5 different clusters = 50 policy secrets + 5 file-settings secrets = 55 total instead of just 5. However, Kubernetes handles thousands of secrets without issue, so I believe this shouldn't be a scalability concern in practice.

I'm still figuring out the necessary code changes as I'm not fully familiar with the codebase yet, and making the ES controller be the writer of file-settings can be tricky. That said, I believe these trade-offs are worth it to eliminate the race conditions and provide proper multi-policy support. As always, I'm more than happy to hear your thoughts on this.

barkbay · 2025-11-12T07:54:05Z

I didn't have a look at the PR 🙃 , just a drive by comment:

This causes unnecessary reconciliation loops (conflict → requeue → retry), wasted work (each of the 4 policies doing identical merge operations),

I think this is pretty common with controllers, as soon as we have a lot watches (for example here) this is very likely going to trigger "no-op" reconciliations. Fortunately, since most operations interact with the informers caches, which are converging toward the actual cluster state, running these reconciliation loops should have minimal impact.

potential configuration drift.

Can you elaborate on that?

However, Kubernetes handles thousands of secrets without issue, so I believe this shouldn't be a scalability concern in practice.

In my experience, the size and memory usage on both the API server and client are more important than the absolute number of Secrets. I think we already have a fair number of Secrets in ECK-managed namespaces. I feel like it would be useful to evaluate whether the added complexity and these new Secrets are really necessary.

pkoutsovasilis · 2025-11-12T08:49:47Z

Hey @barkbay 👋 , thanks for the feedback

I think this is pretty common with controllers, as soon as we have a lot watches (for example here) this is very likely going to trigger "no-op" reconciliations. Fortunately, since most operations interact with the informers caches, which are converging toward the actual cluster state, running these reconciliation loops should have minimal impact.

I see your point and I may be overstating this concern - Kubernetes' eventual consistency model means this kind of transient inconsistency is expected and will converge. However, the fact that we are listing all StackConfigPolicies in the reconcile loop of one, then merging them and then applying them is prone to race conditions, since a StackConfigPolicy can change after the list that happens in a parallel reconcile loop. The bit that I'm not a super fan of is that although k8s optimistic concurrency control is there to prevent such race scenarios, we deliberately choose to rely on it. In the alternative proposal, everything flows in a single direction (from StackConfigPolicy to Elasticseach) without requiring optimistic concurrency control to guarantee correctness (please don't get me wrong it's still a good thing that it's there though 😆).

Can you elaborate on that?

I meant this more in the sense that since the re-queue time in such scenarios isn't something we control, there could be an uncontrolled time period where the configuration isn't what it's supposed to be, and another edit/creation/deletion of a StackConfigPolicy can easily make it longer.

In my experience, the size and memory usage on both the API server and client are more important than the absolute number of Secrets. I think we already have a fair number of Secrets in ECK-managed namespaces. I feel like it would be useful to evaluate whether the added complexity and these new Secrets are really necessary.

Makes sense. I am still exploring the codebase, and thus my previous comment was more in the context of flow than actual code changes. But if you have any ideas on what such an evaluation should look like, please do let me know 🙂

Now, assuming that k8s optimistic concurrency control is good enough to address the concurrency concerns, we still have to work out the multi-StackConfigPolicy soft-ownership of the merged secrets. When all StackConfigPolicies are removed (or have changed resource selectors), the respective secrets in the namespace of elasticsearch should be removed, otherwise we leak. This is again a racy scenario in my head that the alternative approach is not prone to. So @barkbay, any ideas on how this can be handled without disappearing and re-appearing secrets?

barkbay · 2025-11-12T12:41:53Z

Sorry, it's been a long time since I looked at this code, still trying to (re)process it.
Would a cleaner alternative be to use "subcontrollers" for Elasticsearch and Kibana?

For example:

Create a dedicated controller to process ES clusters in parallel

In a new file named pkg/controller/stackconfigpolicy/elasticsearch_controller.go, something along the lines of

var _ reconcile.Reconciler = &ReconcileStackConfigPolicyForElasticsearch{}

// ReconcileStackConfigPolicyForElasticsearch  reconciles StackConfigPolicies objects that apply to Elasticsearch resources.
type ReconcileStackConfigPolicyForElasticsearch struct {
	*ReconcileStackConfigPolicy
}

func (r *ReconcileStackConfigPolicyForElasticsearch) Reconcile(ctx context.Context, request reconcile.Request) (reconcile.Result, error) {
	defer tracing.Span(&ctx)()
	log := ulog.FromContext(ctx)

	log.V(1).Info("Reconcile StackConfigPolicy for Elasticsearch", "es_namespace", request.Namespace, "es_name", request.Name)
...
	// Find all policies that target this Elasticsearch cluster
	allPolicies, err := r.findPoliciesForElasticsearch(ctx, es)
	if err != nil {
		return reconcile.Result{}, err
	}
...
}

In the existing controller, detect which ES clusters must be reconciled (a bit as it is done today)

func Add(mgr manager.Manager, params operator.Parameters) error {

	esCtrlChan := make(chan event.GenericEvent)

	r := newReconciler(mgr, params, esCtrlChan, kbCtrlChan)
	c, err := common.NewController(mgr, controllerName, r, params)
	if err != nil {
		return err
	}

	if err := controllerruntime.NewControllerManagedBy(mgr).
		Named("stackconfigpolicy-es-controller").
		WatchesRawSource(
			source.Channel(esCtrlChan, &handler.EnqueueRequestForObject{}),
		).
		Complete(&ReconcileStackConfigPolicyForElasticsearch{ReconcileStackConfigPolicy: r}); err != nil {
		return err
	}

	return addWatches(mgr, c, r)
}

...

func (r *ReconcileStackConfigPolicy) reconcileElasticsearchResources(ctx context.Context, policy policyv1alpha1.StackConfigPolicy, status policyv1alpha1.StackConfigPolicyStatus) (*reconciler.Results, policyv1alpha1.StackConfigPolicyStatus) {
	defer tracing.Span(&ctx)()
	log := ulog.FromContext(ctx)
	log.V(1).Info("Reconcile Elasticsearch resources")

	results := reconciler.NewResult(ctx)

	// prepare the selector to find Elastic resources to configure
	selector, err := metav1.LabelSelectorAsSelector(&metav1.LabelSelector{
		MatchLabels:      policy.Spec.ResourceSelector.MatchLabels,
		MatchExpressions: policy.Spec.ResourceSelector.MatchExpressions,
	})
	if err != nil {
		return results.WithError(err), status
	}
	listOpts := client.ListOptions{LabelSelector: selector}

	// restrict the search to the policy namespace if it is different from the operator namespace
	if policy.Namespace != r.params.OperatorNamespace {
		listOpts.Namespace = policy.Namespace
	}

	// find the list of Elasticsearch to configure
	var esList esv1.ElasticsearchList
	if err := r.Client.List(ctx, &esList, &listOpts); err != nil {
		return results.WithError(err), status
	}

	for _, es := range esList.Items {
		r.esCtrlChan <- event.GenericEvent{
			Object: &esv1.Elasticsearch{
				ObjectMeta: metav1.ObjectMeta{
					Name:      es.Name,
					Namespace: es.Namespace,
				},
			},
		}
	}

I'm not a GenericEvent expert though, so maybe I'm missing something.

barkbay · 2025-11-13T06:29:29Z

pkg/apis/stackconfigpolicy/v1alpha1/stackconfigpolicy_types.go

+	// Weight determines the priority of this policy when multiple policies target the same resource.
+	// Lower weight values take precedence. Defaults to 0.
+	// +kubebuilder:default=0
+	Weight int32 `json:"weight,omitempty"`


Cosmetic nit, I found it useful to have the weight in a column, feel free to ignore if you think otherwise:

diff --git a/pkg/apis/stackconfigpolicy/v1alpha1/stackconfigpolicy_types.go b/pkg/apis/stackconfigpolicy/v1alpha1/stackconfigpolicy_types.go index e7d17e200..ec0de8d60 100644 --- a/pkg/apis/stackconfigpolicy/v1alpha1/stackconfigpolicy_types.go +++ b/pkg/apis/stackconfigpolicy/v1alpha1/stackconfigpolicy_types.go @@ -34,6 +34,7 @@ func init() { // +kubebuilder:printcolumn:name="Ready",type="string",JSONPath=".status.readyCount",description="Resources configured" // +kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase" // +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" +// +kubebuilder:printcolumn:name="Weight",type="integer",JSONPath=".spec.weight" // +kubebuilder:subresource:status // +kubebuilder:storageversion type StackConfigPolicy struct {

Result:

k get stackconfigpolicies.stackconfigpolicy.k8s.elastic.co NAME READY PHASE AGE WEIGHT cluster-policy-1 1/1 Ready 20h 10 cluster-policy-2 1/1 Ready 20h 10 kibana-policy-1 1/1 Ready 20h 1 kibana-policy-2 1/1 Ready 20h 0

pkoutsovasilis · 2025-11-14T09:44:42Z

Following our team meeting, we've decided to proceed with the following implementation plan:

Subcontroller Pattern (proposed by @barkbay above)
- Create dedicated reconcilers (ReconcileStackConfigPolicyForElasticsearch and ReconcileStackConfigPolicyForKibana) that process target resources
- The StackConfigPolicy controller will emit GenericEvent messages via channels for the affected Elasticsearch and Kibana resources to the reconcilers above
- Each of the new reconcilers will be responsible for merging all the stackconfigpolicies based on their weight, updating the secrets in each resource namespace and reflecting back the status in the policy.
Multi-Soft Owner Secret Cleanup
- Implement multi-soft owner labels on shared secrets to track all policies contributing to a resource
- This enables proper garbage collection when policies are deleted or change selectors

We chose this approach primarily to maintain the same number of secrets as the current implementation. Additionally, this design keeps all changes scoped to the stackconfigpolicy package without requiring modifications to the Elasticsearch or Kibana controllers.

pkoutsovasilis · 2025-12-01T15:08:00Z

Hey @tehbooom 👋 Thank you so much for opening this PR and taking the time to contribute! Your work here was really helpful and provided a great foundation for addressing this issue.

I've created a new PR (#8917) that builds on your work and includes some additional changes we needed to make. I'd love to invite you to review it if you're interested!

I'm going to close this PR since it's been superseded by #8917, but I really appreciate your contribution to the project. Thanks again for taking the initiative!

tehbooom added 5 commits August 18, 2025 12:12

feat: allow mulitple SCPs per cluster

d528988

fix: same weight different resources

e8a95dc

fix: revert comments back to original

706aca4

fix: Removed comment stating that 0 is highest priority since negativ…

f18b19e

…e values are allowed

fix: Update comments

e35848f

tehbooom added the >enhancement Enhancement of existing functionality label Aug 18, 2025

tehbooom added 4 commits August 18, 2025 15:17

docs: Updated API comments

35d4caf

fix: linting

cc21222

fix: linting

a1b9c6c

fix: lint with golangci-lint v1.64.5

a2563fd

pebrc requested a review from Copilot August 22, 2025 08:11

Copilot AI reviewed Aug 22, 2025

View reviewed changes

thbkrkr reviewed Aug 26, 2025

View reviewed changes

pkg/controller/elasticsearch/filesettings/secret.go Show resolved Hide resolved

tehbooom added 3 commits August 28, 2025 13:45

fix: use sort.SliceStable for policies

8445011

fix: function naming

9ea2fae

test: add e2e tests for multi-scp

f7613d2

pebrc reviewed Sep 19, 2025

View reviewed changes

fix: Cluster settings use canonical config

4813911

fix: Flip weight priorities in scp tests

dc6a49c

pebrc reviewed Nov 4, 2025

View reviewed changes

pkoutsovasilis self-assigned this Nov 10, 2025

barkbay reviewed Nov 13, 2025

View reviewed changes

pkoutsovasilis mentioned this pull request Nov 21, 2025

feat: multi SCP composition #8917

Open

pkoutsovasilis closed this Dec 1, 2025

-		return
+func (s *Settings) mergeConfig(target, source *commonv1.Config) *commonv1.Config {
+	if source == nil || source.Data == nil {
+		return target

feat: Multi StackConfigPolicy per cluster #8793

feat: Multi StackConfigPolicy per cluster #8793

Conversation

tehbooom commented Aug 18, 2025 • edited by naemono Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prodsecmachine commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

github-actions bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thbkrkr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pebrc commented Sep 19, 2025

Uh oh!

pebrc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tehbooom commented Oct 16, 2025

Uh oh!

pebrc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pkoutsovasilis commented Nov 11, 2025

Why the Concurrency Issues Exist

My Proposal

Trade-offs

Uh oh!

barkbay commented Nov 12, 2025

Uh oh!

pkoutsovasilis commented Nov 12, 2025

Uh oh!

tehbooom commented Aug 18, 2025 •

edited by naemono

Loading

prodsecmachine commented Aug 18, 2025 •

edited

Loading

github-actions bot commented Aug 18, 2025 •

edited

Loading

barkbay commented Nov 12, 2025 •

edited

Loading