Skip to content

Conversation

@jan-law
Copy link
Contributor

@jan-law jan-law commented Dec 3, 2025

Aiming to reduce total e2e test time by 10min without adding more flakes

refactor tests ref: https://issues.redhat.com/browse/ACM-26529
test flake ref: https://issues.redhat.com/browse/ACM-23597

Summary of changes

  1. Removed top-level Ordered decorator. Tests will run in parallel test processes unless marked with Serial
  2. Refactored setup/cleanup methods to run each Describe container with its own parent policy and namespace suffixed with a hash of the test name
  3. Patched the yaml files with utils.KubectlJSONPatchToFile to update the namespaces and policy name suffixes before creating the resources
  4. Nested 5 tests that create grc example-operator with AllNamespaces mode into an Ordered Describe container
  5. Nested 4 tests that delete the same CRD into an Ordered Describe container. Replaced the operator in these tests with a different operator than the other tests
  6. Ran 3 tests with Serial: 2 tests that break the catalog source (including ACM-23597), and 1 test that counts controller reconciles

Assisted by Claude 4.5

@openshift-ci
Copy link

openshift-ci bot commented Dec 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan-law

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Dec 3, 2025
@jan-law jan-law force-pushed the 26529-parallel-operatorpolicy-tests branch 6 times, most recently from e3ead21 to 4fb0eb9 Compare December 5, 2025 18:37
Refactors the case38 setup methods to deploy each test into
one namespace per Ordered test. The namespaces and
policies are suffixed with a hash of the test name.

ref: https://issues.redhat.com/browse/ACM-26529

Signed-off-by: Janelle Law <[email protected]>
These tests intentionally break the CatalogSource and expect the
PackageManifests to disappear. Run as Serial to prevent interference
from tests that frequently query the PackageManifest API

ref: https://issues.redhat.com/browse/ACM-23597

Signed-off-by: Janelle Law <[email protected]>
Prepare to refactor the tests that
enforce an operatorpolicy for the same operator
while relying on the default operator group behavior

ref: https://issues.redhat.com/browse/ACM-26529

Signed-off-by: Janelle Law <[email protected]>
These tests deploy the same operator with no OperatorGroup, which
defaults to AllNamespaces mode. We need to run these tests one at a time
relative to each other. Otherwise, the status displays unexpected messages
about the same operator managing resources in another test's namespace.

ref: https://issues.redhat.com/browse/ACM-26529

Signed-off-by: Janelle Law <[email protected]>
Move tests that delete and recreate
an operator crd to the same location

ref: https://issues.redhat.com/browse/ACM-26529

Signed-off-by: Janelle Law <[email protected]>
These tests delete and recreate the same CRD.
They will need to run one at a time relative to
each other.

ref: https://issues.redhat.com/browse/ACM-26529

Signed-off-by: Janelle Law <[email protected]>
Replaces the operator under test with another operator
exclusively used for the crd tests. Prevents conflicts
in other tests when the operator's crd is deleted as
part of the crd test setup.

ref: https://issues.redhat.com/browse/ACM-26529

Signed-off-by: Janelle Law <[email protected]>
Match the same catalog source cleanup logic
as "Testing an all default operator policy"

ref: https://issues.redhat.com/browse/ACM-26529

Signed-off-by: Janelle Law <[email protected]>
Signed-off-by: Janelle Law <[email protected]>
@@ -672,22 +832,28 @@ var _ = Describe("Testing OperatorPolicy", Ordered, Label("supports-hosted"), fu
)
})
})

Describe("Testing Subscription behavior for musthave mode while enforcing", Ordered, func() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test flakes occasionally. Both times have the same events, passed the Eventually() in checkFunc, but failed at Consistently(). The last It() step in the test intentionally breaks the spec/subscription/sourceNamespace, but then an install plan starts installing during Consistently(). The subscription status changes to Compliant instead of remaining at the expected ConstraintsNotSatisfiable. Any ideas why?

{
          "lastTransitionTime": "2025-12-04T19:14:57Z",
          "message": "the Subscription matches what is required by the policy",
          "reason": "SubscriptionMatches",
          "status": "True",
          "type": "SubscriptionCompliant"
        },
        {
          "lastTransitionTime": "2025-12-04T19:14:23Z",
          "message": "the policy spec is valid",
          "reason": "PolicyValidated",
          "status": "True",
          "type": "ValidPolicySpec"
        }
      ],
      "history": [
        {
          "lastTimestamp": "2025-12-04T19:14:58.015269Z", # unexpected
          "message": "NonCompliant; the policy spec is valid, the OperatorGroup matches what is required by the policy, the Subscription matches what is required by the policy, a relevant InstallPlan is actively installing, the ClusterServiceVersion required by the policy was not found, no CRDs were found for the operator, there are no relevant deployments because the ClusterServiceVersion is missing, CatalogSource 'operatorhubio-catalog' was not found"
        },
        {
          "lastTimestamp": "2025-12-04T19:14:55.701762Z", # this should be the last message
          "message": "NonCompliant; the policy spec is valid, the OperatorGroup matches what is required by the policy, constraints not satisfiable: refer to the Subscription for more details, there are no relevant InstallPlans in the namespace, the ClusterServiceVersion required by the policy was not found, no CRDs were found for the operator, there are no relevant deployments because the ClusterServiceVersion is missing, CatalogSource 'operatorhubio-catalog' was not found"
        },
...
  wanted related objects: [{Properties:<nil> Object:{Metadata:{Name:project-quay Namespace:operator-policy-testns-124d3267} APIVersion:operators.coreos.com/v1alpha1 Kind:Subscription} Compliant:NonCompliant Reason:ConstraintsNotSatisfiable}]
  wanted condition: {Type:SubscriptionCompliant Status:False ObservedGeneration:0 LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:ConstraintsNotSatisfiable Message:constraints not satisfiable: refer to the Subscription for more details}
...
[FAILED] Failed after 2.327s.
  The function passed to Consistently failed at /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:169 with:
  Expected
      <string>: Compliant
  to equal
      <string>: NonCompliant

logs:
https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/19911108613/job/57178128413?pr=419

https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/19972532656/job/57280596312

Retry the patch commands to prevent a test
flake timeout when the patch commands hung
without retrying

Signed-off-by: Janelle Law <[email protected]>
Comment on lines +1442 to +1447
utils.Kubectl("patch", "operatorpolicy", opPolName, "-n", testNamespace, "--type=json", "-p",
`[{"op": "replace", "path": "/spec/subscription/startingCSV", "value": "`+goodVersion+`"},`+
`{"op": "replace", "path": "/spec/remediationAction", "value": "inform"}]`)
KubectlTarget("patch", "subscription.operator", subName, "-n", opPolTestNS, "--type=json", "-p",
`[{"op": "replace", "path": "/spec/startingCSV", "value": "`+goodVersion+`"}]`)

Copy link
Contributor Author

@jan-law jan-law Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one test flake, the logs indicated the subscription wasn't patched, then timed out. Hopefully retrying helps. https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/19972532656/job/57291714147#logs

2025-12-05T21:04:04.310Z	debug	controllers/operatorpolicy_controller.go:1711	..."controllerKind": "OperatorPolicy", "OperatorPolicy": {"name":"oppol-manual-upgrades-e8e934d1","namespace":"managed"}, ... "subscriptionConditionMessage": "constraints not satisfiable: no operators found with name strimzi-cluster-operator.v0.0.0.1337 in channel strimzi-0.36.x of package strimzi-kafka-operator in the catalog referenced by subscription strimzi-kafka-operator, subscription strimzi-kafka-operator exists"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant