Skip to content

Conversation

@fishereskew
Copy link

Added logic to annotate namespace to prevent another default controller from being created. Annotation logic was implemented to prevent a scenario similar to that in: kedacore/keda-olm-operator#255

verticalpodautoscaler_controller.go:

  • Changed logic to mimic that of the code in the keda-olm-operator to prevent duplicate annotation code and so that it uses WaitForCacheSync() instead of relying on 60 attempts with 1-second intervals.

e2e.sh:

  • Added function to test the controller using the annotation to prevent another default controller from being created.

autoscaling_v1_verticalpodautoscalercontroller.yaml:

  • Added required namespace for the controller to be recognized.
  • Let me know if this was left without the namespace for a reason I'm not aware of

role.yaml:

  • Added namespaces for controller code to be able to access the operator namespace.

README.md & Makefile:

  • A cluster with the operator running is required for the test-e2e command to work properly so I changed the comments to try to reflect that and be a bit more clear.

@openshift-ci openshift-ci bot requested review from joelsmith and wangchen615 July 1, 2025 16:26
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fishereskew
Once this PR has been reviewed and has the lgtm label, please assign wangchen615 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fishereskew fishereskew changed the title feat/test/fix: namespace annotation for default controller, e2e tests… AUTOSCALE-283: Annotation logic and e2e test Jul 1, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 1, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 1, 2025

@fishereskew: This pull request references AUTOSCALE-283 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Added logic to annotate namespace to prevent another default controller from being created. Annotation logic was implemented to prevent a scenario similar to that in: kedacore/keda-olm-operator#255

verticalpodautoscaler_controller.go:

  • Changed logic to mimic that of the code in the keda-olm-operator to prevent duplicate annotation code and so that it uses WaitForCacheSync() instead of relying on 60 attempts with 1-second intervals.

e2e.sh:

  • Added function to test the controller using the annotation to prevent another default controller from being created.

autoscaling_v1_verticalpodautoscalercontroller.yaml:

  • Added required namespace for the controller to be recognized.
  • Let me know if this was left without the namespace for a reason I'm not aware of

role.yaml:

  • Added namespaces for controller code to be able to access the operator namespace.

README.md & Makefile:

  • A cluster with the operator running is required for the test-e2e command to work properly so I changed the comments to try to reflect that and be a bit more clear.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 8, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2025
@fishereskew
Copy link
Author

/retest

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 25, 2025
@joelsmith
Copy link
Contributor

/retest

@fishereskew
Copy link
Author

@joelsmith I'm switching out the script function for a ginkgo+gomega test suite. Once I finish that then it should be ready.

@fishereskew fishereskew force-pushed the main branch 2 times, most recently from 6f3dfb7 to ddc9955 Compare July 31, 2025 22:15
@fishereskew fishereskew changed the title AUTOSCALE-283: Annotation logic and e2e test AUTOSCALE-283: Annotation logic for controller creation and test additions Jul 31, 2025
@fishereskew fishereskew force-pushed the main branch 2 times, most recently from 5be0419 to fc15bcc Compare August 1, 2025 14:05
@fishereskew
Copy link
Author

Update: added ginko+gomega test suite using envTest. Did a basic ginkgo test translation for other test files.

@fishereskew
Copy link
Author

/retest

@fishereskew fishereskew force-pushed the main branch 2 times, most recently from e13d0ca to c5b856e Compare August 4, 2025 15:23
@maxcao13
Copy link
Member

maxcao13 commented Aug 5, 2025

I'm running the operator and I see a lot of these in the operator logs:

E0805 21:33:38.666717 1 reflector.go:166] "Unhandled Error" err="sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:108: Failed to watch *v1.Namespace: namespaces is forbidden: User \"system:serviceaccount:openshift-vertical-pod-autoscaler:vertical-pod-autoscaler-operator\" cannot watch resource \"namespaces\" in API group \"\" at the cluster scope" logger="UnhandledError"

I think we need to add a watch rule for namespaces? Although I don't really know why, becasue we are not setting up explicit watches for namespaces. I'm assuming operator-sdk code is doing something weird here.

@maxcao13
Copy link
Member

maxcao13 commented Aug 5, 2025

0805 21:20:04.963068 1 verticalpodautoscaler_controller.go:232] Error creating VerticalPodAutoscalerController deployment: deployments.apps "vpa-recommender-default" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion autoscaling.openshift.io/v1 Kind VerticalPodAutoscalerController: no matches for kind "VerticalPodAutoscalerController" in version "autoscaling.openshift.io/v1"
2025-08-05T21:20:04Z ERROR Reconciler error {"controller": "verticalpodautoscalercontroller", "controllerGroup": "autoscaling.openshift.io", "controllerKind": "VerticalPodAutoscalerController", "VerticalPodAutoscalerController": {"name":"default","namespace":"openshift-vertical-pod-autoscaler"}, "namespace": "openshift-vertical-pod-autoscaler", "name": "default", "reconcileID": "1187c43f-d046-4ef1-acc9-08ea349e1fbb", "error": "deployments.apps \"vpa-recommender-default\" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion autoscaling.openshift.io/v1 Kind VerticalPodAutoscalerController: no matches for kind \"VerticalPodAutoscalerController\" in version \"autoscaling.openshift.io/v1\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224

I also see these errors on startup, but they go away quickly after, and the startup proceeds.

Not really sure why that is. I put in a 20 second sleep like this:

// SetupWithManager sets up the controller with the Manager.
func (r *VerticalPodAutoscalerControllerReconciler) SetupWithManager(mgr ctrl.Manager) error {
       time.Sleep(20)
	// Create a default VPAController upon first operator startup
	go func() {
		err := r.ensureVPAController(mgr)
...

And the error stopped showing up, so something in there is too fast? Really weird.

Hopefully I'm not doing anything wrong here in the setup, but those are the errors I see when I'm manually testing this.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 5, 2025

@fishereskew: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@fishereskew
Copy link
Author

fishereskew commented Aug 6, 2025

@maxcao13 not really sure why that error you're getting is happening. I added the watch rbac and I'm getting a reconciler error for configmaps already existing when it tries to patch the annotation:

ERROR	Reconciler error	{"controller": "verticalpodautoscalercontroller", "controllerGroup": "autoscaling.openshift.io", "controllerKind": "VerticalPodAutoscalerController", "VerticalPodAutoscalerController": {"name":"default","namespace":"openshift-vertical-pod-autoscaler"}, "namespace": "openshift-vertical-pod-autoscaler", "name": "default", "reconcileID": "628d8aa0-6111-47c6-a0b4-0f5a5785c7a5", "error": "configmaps \"vpa-tls-ca-certs\" already exists"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224

Sleeping for 20 seconds also makes this error go away.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants