Skip to content

Conversation

@rhmdnd
Copy link
Collaborator

@rhmdnd rhmdnd commented Sep 19, 2025

No description provided.

@rhmdnd rhmdnd requested a review from yuumasato September 19, 2025 21:39
@rhmdnd rhmdnd force-pushed the reimplement-manual-remediations branch from ad55aa1 to fb74e9a Compare September 19, 2025 21:42
Copy link
Collaborator

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to break this down a little? 300k lines of code is a bit... much

@rhmdnd
Copy link
Collaborator Author

rhmdnd commented Sep 22, 2025

Is there a way to break this down a little? 300k lines of code is a bit... much

The 300k LOC is due to a golang version bump. That can be isolated into a separate patch, but I expect it'll be just as large.

@rhmdnd
Copy link
Collaborator Author

rhmdnd commented Sep 22, 2025

Pulling the golang dep update into #47 - which existed prior to this (I wasn't planning on updating the deps here).

@rhmdnd
Copy link
Collaborator Author

rhmdnd commented Sep 22, 2025

Another part of this is that I included an update for CO to bump from 0.1.46 to 1.7.0 (to get rid of an older dependency that was broken, I believe) - which is causing some dependency churn with k8s and controller runtime dependencies.

If we can get things updated to 1.23, then perhaps we can bump the CO dependencies after to get to something recent.

The refactor to reorganize the e2e suite eliminated the ability to apply
manual remediations. This commit adds that back into the suite at an
appropriate place. It also introduces a new test configuration option
that let's callers control how long they want to wait for the
remediation timeout. It defaults to 30 minutes per remediation, which
run in parallel.
Render the mismatched results as markdown and HTML so it's easier to
debug when jobs fail due to mismatched rule behavior.
Fix minor linting issues.
@rhmdnd rhmdnd force-pushed the reimplement-manual-remediations branch from 46b6cd1 to 18f76c3 Compare September 23, 2025 21:25
@rhmdnd
Copy link
Collaborator Author

rhmdnd commented Sep 24, 2025

Need openshift/release#69499 to land before this will start passing.

@yuumasato
Copy link
Member

/retest

}

// Replace - with _ so we can find the directory name in ComplianceAsCode/content
ruleName = strings.ReplaceAll(ruleName, "-", "_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be problematic as some of the RHCOS4 rule IDs contain a hyphen, for example:

./linux_os/guide/services/deprecated/package_telnetd-ssl_removed/rule.yml
./linux_os/guide/system/bootloader-grub2/grub2_systemd_debug-shell_argument_absent/rule.yml
./linux_os/guide/auditing/package_audit-audispd-plugins_installed/rule.yml
./linux_os/guide/system/permissions/mounting/kernel_module_usb-storage_disabled/rule.yml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we just exclude manual remediation from RHCOS since it doesn't look like there are any:

$ find linux_os -type f -name e2e-remediation.sh
$ find applications -type f -name e2e-remediation.sh 
applications/openshift/api-server/api_server_encryption_provider_cipher/tests/ocp4/e2e-remediation.sh
applications/openshift/api-server/audit_log_forwarding_enabled/tests/ocp4/e2e-remediation.sh
applications/openshift/api-server/audit_log_forwarding_enabled_logging_api/tests/ocp4/e2e-remediation.sh
applications/openshift/authentication/idp_is_configured/tests/ocp4/e2e-remediation.sh
applications/openshift/general/banner_or_login_template_set/tests/ocp4/e2e-remediation.sh
applications/openshift/general/classification_banner/tests/ocp4/e2e-remediation.sh
applications/openshift/general/file_integrity_notification_enabled/tests/ocp4/e2e-remediation.sh
applications/openshift/general/gitops_operator_exists/tests/ocp4/e2e-remediation.sh
applications/openshift/general/oauth_login_template_set/tests/ocp4/e2e-remediation.sh
applications/openshift/general/openshift_motd_exists/tests/ocp4/e2e-remediation.sh
applications/openshift/general/resource_requests_quota/tests/ocp4/e2e-remediation.sh
applications/openshift/general/resource_requests_quota_per_project/tests/ocp4/e2e-remediation.sh
applications/openshift/general/oauth_logout_url_set/tests/ocp4/e2e-remediation.sh
applications/openshift/general/oauth_provider_selection_set/tests/ocp4/e2e-remediation.sh
applications/openshift/general/acs_sensor_exists/tests/ocp4/e2e-remediation.sh
applications/openshift/general/sandboxed_containers_operator_configured/tests/ocp4/e2e-remediation.sh
applications/openshift/general/sandboxed_containers_operator_exists/tests/ocp4/e2e-remediation.sh
applications/openshift/integrity/file_integrity_exists/tests/ocp4/e2e-remediation.sh
applications/openshift/integrity/reject_unsigned_images_by_default/tests/ocp4/e2e-remediation.sh
applications/openshift/networking/configure_network_policies_namespaces/tests/ocp4/e2e-remediation.sh
applications/openshift/networking/default_ingress_ca_replaced/tests/ocp4/e2e-remediation.sh
applications/openshift/networking/ingress_controller_certificate/tests/ocp4/e2e-remediation.sh
applications/openshift/networking/configure_egress_ip_node_assignable/tests/ocp4/e2e-remediation.sh
applications/openshift/risk-assessment/container_security_operator_exists/tests/ocp4/e2e-remediation.sh
applications/openshift/confinement/security_profiles_operator_exists/tests/ocp4/e2e-remediation.sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative is to make the rule name a regular by expression swapping any _ for (-|_), then using it in findRulePath()

The TestMain function created the initial test configuration, and does
important things like cloning the content repository and installing the
operator. However, each test case was creating it's own test
configuration. This is mostly fine, but doesn't safely account for the
ContentDir argument since that's updated in Setup(), which is only ever
called from TestMain(). All the other test functions assume Setup() will
handle the content directory, but that doesn't work if they don't share
the same test configuration.

This commit moves the test configuration to a global that setup creates
and bootstraps using the parse test arguments. From there, each of the
test cases can use the global test configuration, so long as it's not
modified (it should be a read-only resource after Setup() so that it
doesn't cause inconsistent behaviors between tests).
@Vincent056
Copy link
Collaborator

25/09/25 21:13:37 Applied remediation /tmp/content-2235741757/applications/openshift/api-server/api_server_encryption_provider_cipher/tests/ocp4/e2e-remediation.sh
    e2e_test.go:425: Failed to apply manual remediations: failed to apply manual remediation /tmp/content-2235741757/applications/openshift/api-server/audit_log_forwarding_enabled/tests/ocp4/e2e-remediation.sh: exit status 127
        Output: ++ oc version -ojson
        ++ jq -r .openshiftVersion
        /tmp/content-2235741757/applications/openshift/api-server/audit_log_forwarding_enabled/tests/ocp4/e2e-remediation.sh: line 4: jq: command not found
        + ocp_version=
--- FAIL: TestProfileRemediations (2053.73s)
FAIL
``` it looks like we need to install jq?

@rhmdnd
Copy link
Collaborator Author

rhmdnd commented Sep 25, 2025

25/09/25 21:13:37 Applied remediation /tmp/content-2235741757/applications/openshift/api-server/api_server_encryption_provider_cipher/tests/ocp4/e2e-remediation.sh
    e2e_test.go:425: Failed to apply manual remediations: failed to apply manual remediation /tmp/content-2235741757/applications/openshift/api-server/audit_log_forwarding_enabled/tests/ocp4/e2e-remediation.sh: exit status 127
        Output: ++ oc version -ojson
        ++ jq -r .openshiftVersion
        /tmp/content-2235741757/applications/openshift/api-server/audit_log_forwarding_enabled/tests/ocp4/e2e-remediation.sh: line 4: jq: command not found
        + ocp_version=
--- FAIL: TestProfileRemediations (2053.73s)
FAIL
``` it looks like we need to install jq?

Yeah - looks like it. We can either do that here or in the openshift/release repo I think.

@mkdir -p $(BUILD_DIR)

.PHONY: install-jq
install-jq: ## Install jq if not available
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to do this in the CI configs if we don't use the make file target here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking care of that here - openshift/release#69497

@rhmdnd rhmdnd force-pushed the reimplement-manual-remediations branch from d9fe9ab to c5dcb6b Compare September 26, 2025 01:05
Some remediations require this to be on the box to function.
@rhmdnd rhmdnd force-pushed the reimplement-manual-remediations branch from c5dcb6b to c16a153 Compare September 26, 2025 02:22
@yuumasato
Copy link
Member

25/09/25 21:13:37 Applied remediation /tmp/content-2235741757/applications/openshift/api-server/api_server_encryption_provider_cipher/tests/ocp4/e2e-remediation.sh
    e2e_test.go:425: Failed to apply manual remediations: failed to apply manual remediation /tmp/content-2235741757/applications/openshift/api-server/audit_log_forwarding_enabled/tests/ocp4/e2e-remediation.sh: exit status 127
        Output: ++ oc version -ojson
        ++ jq -r .openshiftVersion
        /tmp/content-2235741757/applications/openshift/api-server/audit_log_forwarding_enabled/tests/ocp4/e2e-remediation.sh: line 4: jq: command not found
        + ocp_version=
--- FAIL: TestProfileRemediations (2053.73s)
FAIL
``` it looks like we need to install jq?

@Vincent056 Where have you found this output? I don't see the detailed logs:

 2025/09/26 03:35:20 Applied remediation /tmp/content-3561661255/applications/openshift/api-server/api_server_encryption_provider_cipher/tests/ocp4/e2e-remediation.sh
    e2e_test.go:425: Failed to apply manual remediations: timed out waiting for manual remediation /tmp/content-3561661255/applications/openshift/authentication/idp_is_configured/tests/ocp4/e2e-remediation.sh to apply
--- FAIL: TestProfileRemediations (2045.92s) 

Copy link
Member

@yuumasato yuumasato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just followed up on a previous comment, otherwise it looks good.

The tests failed because some of the manual remediations timed up.
But I guess this is to be investigated and fixed on the content repo.

}

// Replace - with _ so we can find the directory name in ComplianceAsCode/content
ruleName = strings.ReplaceAll(ruleName, "-", "_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative is to make the rule name a regular by expression swapping any _ for (-|_), then using it in findRulePath()

if err != nil {
return err
}
if d.IsDir() && d.Name() == ruleName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could match the directory name against a regular expression.

@rhmdnd
Copy link
Collaborator Author

rhmdnd commented Sep 26, 2025

25/09/25 21:13:37 Applied remediation /tmp/content-2235741757/applications/openshift/api-server/api_server_encryption_provider_cipher/tests/ocp4/e2e-remediation.sh
    e2e_test.go:425: Failed to apply manual remediations: failed to apply manual remediation /tmp/content-2235741757/applications/openshift/api-server/audit_log_forwarding_enabled/tests/ocp4/e2e-remediation.sh: exit status 127
        Output: ++ oc version -ojson
        ++ jq -r .openshiftVersion
        /tmp/content-2235741757/applications/openshift/api-server/audit_log_forwarding_enabled/tests/ocp4/e2e-remediation.sh: line 4: jq: command not found
        + ocp_version=
--- FAIL: TestProfileRemediations (2053.73s)
FAIL
``` it looks like we need to install jq?

I added a commit to install JQ using the Makefile, and running that prior to the tests/

@Vincent056 Where have you found this output? I don't see the detailed logs:

 2025/09/26 03:35:20 Applied remediation /tmp/content-3561661255/applications/openshift/api-server/api_server_encryption_provider_cipher/tests/ocp4/e2e-remediation.sh
    e2e_test.go:425: Failed to apply manual remediations: timed out waiting for manual remediation /tmp/content-3561661255/applications/openshift/authentication/idp_is_configured/tests/ocp4/e2e-remediation.sh to apply
--- FAIL: TestProfileRemediations (2045.92s) 

@rhmdnd
Copy link
Collaborator Author

rhmdnd commented Sep 26, 2025

Both CIS and STIG platform scans failed to apply the IdP remediation - https://github.com/ComplianceAsCode/content/blob/master/applications/openshift/authentication/idp_is_configured/tests/ocp4/e2e-remediation.sh

I wonder if the cluster is churning and it's never considered stable for at least 2 minutes? Otherwise - that remediation is pretty straight forward.

We have existing assertion files in the CaC/content repository that
model how all profiles should behave. Let's make sure we look for those
assertions when we process the scan results. This commit adds the
product into the name so that we're following the existing convention,
for example:

  rhcos4-moderate-4.15.yml
Remove the -rule-assertions suffix from the assertion file name so that
the files we generate are easier to copy/paste into the content
repsitory for maintenance since they're using the same name.
@rhmdnd
Copy link
Collaborator Author

rhmdnd commented Sep 26, 2025

/test e2e-aws-ocp4-cis
/test e2e-aws-openshift-platform-compliance

Failed to stand up the cluster.

Some rules use - and some use _, so we can't rely solely on swapping -
to _ to find the right rule. This commit updates the utilities to use a
regular expression to handle that better.
If a manual remediation fails, let's log the output so it's easier to
diagnose what happened.
If we're expecting a result in the scan output, but we don't find it in
the results, log it as a failure with clear language that says it's
missing. This will help us know if rules were accidentally removed from
porfiles.
Previously, we had a utility that would assert the scan results against
a set of assertions, and if the assertions didn't exist it would
generate an assertion file. We can pull this logic up to the test layer,
where it defines the assertion file it needs for the test, and it's also
responsible for writing the assertions for failed tests to disk. Then,
the verification utility methods are only responsible for matching
expected and actual results.
ioutil has been deprecated, let's use os instead.
@openshift-ci
Copy link

openshift-ci bot commented Sep 26, 2025

@rhmdnd: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ocp4-stig 5d7f382 link true /test e2e-aws-ocp4-stig
ci/prow/e2e-aws-ocp4-cis 5d7f382 link true /test e2e-aws-ocp4-cis
ci/prow/e2e-aws-rhcos4-moderate 5d7f382 link true /test e2e-aws-rhcos4-moderate

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@yuumasato
Copy link
Member

/test e2e-aws-openshift-platform-compliance

Copy link
Member

@yuumasato yuumasato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

The profile tests will be failing until we add assertion files.

@yuumasato yuumasato merged commit 332f09e into ComplianceAsCode:main Sep 29, 2025
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants