Add taints and debug info.

This commit is contained in:
Mikael Frykholm 2025-02-04 10:18:57 +01:00
parent 5304aca0a3
commit 449fc872ce
Signed by: mifr
GPG key ID: 1467F9D69135C236
2 changed files with 12 additions and 1 deletions

View file

@ -48,6 +48,7 @@
* Add all other _Controller_ nodes with `microk8s join 89.46.21.119:25000/12345678987654345678976543/1234565`
* Add all other _Worker_ nodes with `microk8s join 89.46.21.119:25000/12345678987654345678976543/1234565 --worker`
* Taint controller nodes so they wont get workload:` microk8s.kubectl taint nodes --selector=node.kubernetes.io/microk8s-controlplane=microk8s-controlplane cp-node=true:NoExecute`
* Taint Postgres nodes so they wont get workload:` microk8s.kubectl taint nodes --selector=sunet.se/role=cnpg pg-node=true:NoExecute`
* `kubectl get nodes` should show something like:
```
@ -78,9 +79,14 @@ internal-sto4-test-k8sc-1.rut.sunet.se Ready <none> 16d v1.28.7
## Day 2 operations:
Rolling upgrade:
###Rolling upgrade:
On controllers:
kubectl drain internal-sto4-test-k8sc-0.rut.sunet.se --ignore-daemonset
On workers:
kubectl drain internal-sto4-test-k8sw-0.rut.sunet.se --force --ignore-daemonsets --delete-emptydir-data --disable-eviction
After upgrade: monitor that calico has working access to the cluster and look for problems like `Candidate IP leak handle och too old resource version` in calico-kube-controllers pod. If theese are found calico cane be restarted with:
kubectl rollout restart deployment calico-kube-controllers -n kube-system
kubectl rollout restart daemonset calico-node -n kube-system

View file

@ -12,3 +12,8 @@ spec:
topologyKey: failure-domain.beta.kubernetes.io/zone
nodeSelector:
sunet.se/role: cnpg
tolerations:
- effect: NoExecute
key: pg-node
operator: Equal
value: "true"