Skip to content

Conversation

@fgiorgetti
Copy link
Member

In case a Site is deleted and recreated quickly (eg: through an automation), the skupper-router ConfigMap owned by the previous site, may still be present.

The controller now fails, if it finds a router configuration that is not owned by the currently active site.

Fixes #2323.

In case a Site is deleted and recreated quickly (automated), the
skupper-router ConfigMap owned by the previous site, may still be
present (owned recource not yet deleted).

The controller now fails, if it finds a router configuration that
is not owned by the currently active site.

Fixes skupperproject#2323.
@vsomwanshi
Copy link

@fgiorgetti
I don't know what is this exact issue is, however if you look at below output when i ran skupper site deletion command i did check all the objects and everything was completely clean.

$ 1188 [Sun 23 2:31PM] ip-192-168-1-6 :~/Desktop/isd-pl-f07e95fc0d9a skupper site delete --all -n test-site-new
Waiting for deletion to complete...
Site "test-site-new" is deleted

$ oc get site
No resources found in test-site-new namespace.

$ oc get pods 
No resources found in test-site-new namespace.

$ oc get secret
NAME                             TYPE                             DATA   AGE
all-icr-io                       kubernetes.io/dockerconfigjson   1      36m
builder-dockercfg-8b9t7          kubernetes.io/dockercfg          1      36m
default-dockercfg-6s5xw          kubernetes.io/dockercfg          1      36m
deployer-dockercfg-57xwx         kubernetes.io/dockercfg          1      36m
pipeline-dockercfg-8mbn6         kubernetes.io/dockercfg          1      36m
skupper-router-dockercfg-ctjvr   kubernetes.io/dockercfg          1      35m

$ oc get cm 
NAME                       DATA   AGE
config-service-cabundle    1      36m
config-trusted-cabundle    1      36m
kube-root-ca.crt           1      36m
openshift-service-ca.crt   1      36m

I am happy that you were able to reproduce this issue. Unfortunately i was unable to reproduce this in our lower environments, this is happening only in our production environment.

@fgiorgetti
Copy link
Member Author

I am happy that you were able to reproduce this issue. Unfortunately i was unable to reproduce this in our lower environments, this is happening only in our production environment.

@vsomwanshi I was able to reproduce it, when I quickly delete/create a site, like in an automated way through a script.

What happened was that once a site is deleted and another site is created, the site that is created is being processed before the old resources, owned by the deleted site, have been removed, causing that error, which as you pointed out in the issue, can be recovered if you restart the skupper-controller pod.

Can you share some details on the procedure you guys are following in production to reproduce it? Is it possible that you guys have 2 sites created on the same namespace at the time you're deleting it? This could potentially be a similar trigger to that. Or eventually once you remove a site, is there any gitops operator applying a new site definition?

@vsomwanshi
Copy link

@fgiorgetti Please find comments inline;

Can you share some details on the procedure you guys are following in production to reproduce it?
--> Following below steps to reproduce the issue;

## Method 1:

- Delete skupper site from CLI using command : skupper site delete --all -n <namespace> 
- Wait for some time to get Site object as well other relative components deleted.
- Sync the Site yaml configuration ( Site Object ) from gitops which will eventually create all the relative objects as well. 

## Method 2:

- Delete skupper site object from gitops
- Wait for some time to get Site object as well other relative components deleted.
- Sync the Site yaml configuration ( Site Object ) from gitops eventually which will eventually create all the relative objects as well. 

is it possible that you guys have 2 sites created on the same namespace at the time you're deleting it? This could potentially be a similar trigger to that. Or eventually once you remove a site.
--> We have 1:1 mapping, we are creating one skupper site only in one namespace. Other thing is anyway skupper controller will not allow you to create another site in the same namespace when Site object is already present in the namespace.

is there any gitops operator applying a new site definition?
--> Yes, we have entire setup of skupper through gitops only. Skupper controller, CRD's and sizing profile configmap's are deployed in one dedicated namespace. Site's are deployed in separate namespaces. In our production environment we have 55 skupper site created in one OpenShift cluster. Each site has 14 listeners and 5 connectors.

Not sure but somehow i am unable to reproduce this issue in our lower environments. Would it be happening in production because as mentioned in above comment we have 55 skupper sites created in one OpenShift cluster and each site has 14 listeners and 5 connectors. is it creating more events and due to which skupper-controller is unstable or unable to identify the site cleanups operations etc etc ?

@vsomwanshi
Copy link

@fgiorgetti or anyone of you can answer this; so this fix you are applying would be part of the latest release, right ? may be skupper 2.1.3 ? i could see lot of issues your team has fixed and i would need to rollout them in our environments near in future.

If i need to go with this release in future in our environments.

[1] During upgrade phase from 2.1.0 to 2.1.3 i just need to simply upgrade the skupper controller to 2.1.3, rest of the things would be completely taken care by controller itself (e.g upgrade skupper-router, kube-adaptor etc etc ) ?

[2] I believe no downtime required for this upgrade process but just for confirmation i am asking so i can accordingly take it to management.

[3] No need to touch the site's as well as skupper link recreation also not required.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

configmaps "skupper-router" not found

2 participants