Better rolling upgrade instructions.

2025-02-04 10:41:27 +01:00 · 2025-02-04 10:41:27 +01:00 · 56332586bb
commit 56332586bb
parent 52e5056e21
1 changed files with 12 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -80,13 +80,22 @@ internal-sto4-test-k8sc-1.rut.sunet.se   Ready      <none>   16d   v1.28.7

 ## Day 2 operations:
 ### Rolling upgrade:
-On controllers:
+
+Drain one controller at the time with:
 kubectl drain internal-sto4-test-k8sc-0.rut.sunet.se  --ignore-daemonset
+After the first node is drained restart the calio controller with:
+`kubectl rollout restart deployment calico-kube-controllers -n kube-system`
+After that restart the calico-node running on that host by deleting it. It should be automatically recreated by the controller. 
+`kubectl delete pod calico-node-???? -n kube-system`

-On workers:
+Continue with the workers (Including PG nodes):
 kubectl drain internal-sto4-test-k8sw-0.rut.sunet.se --force --ignore-daemonsets --delete-emptydir-data --disable-eviction
+`kubectl delete pod calico-node-???? -n kube-system`
+
+## Calico problems
+Calico can get in a bad state. Look for problems like `Candidate IP leak handle och too old resource version` in calico-kube-controllers pod. If theese are found calico can be restarted with:

-After upgrade: monitor that calico has working access to the cluster and look for problems like `Candidate IP leak handle och too old resource version` in calico-kube-controllers pod. If theese are found calico cane be restarted with:
 kubectl rollout restart deployment calico-kube-controllers -n kube-system
 kubectl rollout restart daemonset calico-node -n kube-system

+This will disrupt the whole cluster for a few seconds.