From 56332586bbc67b9651413ccbe2e60381d31f125c Mon Sep 17 00:00:00 2001 From: Mikael Frykholm Date: Tue, 4 Feb 2025 10:41:27 +0100 Subject: [PATCH] Better rolling upgrade instructions. --- README.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 2a64abd..9d80a24 100644 --- a/README.md +++ b/README.md @@ -80,13 +80,22 @@ internal-sto4-test-k8sc-1.rut.sunet.se Ready 16d v1.28.7 ## Day 2 operations: ### Rolling upgrade: -On controllers: + +Drain one controller at the time with: kubectl drain internal-sto4-test-k8sc-0.rut.sunet.se --ignore-daemonset +After the first node is drained restart the calio controller with: +`kubectl rollout restart deployment calico-kube-controllers -n kube-system` +After that restart the calico-node running on that host by deleting it. It should be automatically recreated by the controller. +`kubectl delete pod calico-node-???? -n kube-system` -On workers: +Continue with the workers (Including PG nodes): kubectl drain internal-sto4-test-k8sw-0.rut.sunet.se --force --ignore-daemonsets --delete-emptydir-data --disable-eviction +`kubectl delete pod calico-node-???? -n kube-system` + +## Calico problems +Calico can get in a bad state. Look for problems like `Candidate IP leak handle och too old resource version` in calico-kube-controllers pod. If theese are found calico can be restarted with: -After upgrade: monitor that calico has working access to the cluster and look for problems like `Candidate IP leak handle och too old resource version` in calico-kube-controllers pod. If theese are found calico cane be restarted with: kubectl rollout restart deployment calico-kube-controllers -n kube-system kubectl rollout restart daemonset calico-node -n kube-system +This will disrupt the whole cluster for a few seconds.