Add taints and debug info.

2025-02-04 10:18:57 +01:00 · 2025-02-04 10:18:57 +01:00 · 449fc872ce
commit 449fc872ce
parent 5304aca0a3
2 changed files with 12 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -48,6 +48,7 @@
 * Add all other _Controller_ nodes with `microk8s join 89.46.21.119:25000/12345678987654345678976543/1234565`
 * Add all other _Worker_ nodes with `microk8s join 89.46.21.119:25000/12345678987654345678976543/1234565 --worker`
 * Taint controller nodes so they wont get workload:` microk8s.kubectl taint nodes --selector=node.kubernetes.io/microk8s-controlplane=microk8s-controlplane cp-node=true:NoExecute`
+* Taint Postgres  nodes so they wont get workload:` microk8s.kubectl taint nodes --selector=sunet.se/role=cnpg pg-node=true:NoExecute`
 * `kubectl get nodes` should show something like:

 ```
@ -78,9 +79,14 @@ internal-sto4-test-k8sc-1.rut.sunet.se   Ready      <none>   16d   v1.28.7


 ## Day 2 operations:
-Rolling upgrade:
+###Rolling upgrade:
 On controllers:
 kubectl drain internal-sto4-test-k8sc-0.rut.sunet.se  --ignore-daemonset

 On workers:
 kubectl drain internal-sto4-test-k8sw-0.rut.sunet.se --force --ignore-daemonsets --delete-emptydir-data --disable-eviction
+
+After upgrade: monitor that calico has working access to the cluster and look for problems like `Candidate IP leak handle och too old resource version` in calico-kube-controllers pod. If theese are found calico cane be restarted with:
+kubectl rollout restart deployment calico-kube-controllers -n kube-system
+kubectl rollout restart daemonset calico-node -n kube-system
+
--- a/k8s/cnpg-cluster.yaml
+++ b/k8s/cnpg-cluster.yaml
@ -12,3 +12,8 @@ spec:
    topologyKey: failure-domain.beta.kubernetes.io/zone
    nodeSelector:
      sunet.se/role: cnpg
+    tolerations:
+      - effect: NoExecute
+        key: pg-node
+        operator: Equal
+        value: "true"