Kubernetes Version Upgrades and Downtime

7/16/2019

I just tested Ranche RKE , upgrading kubernetes 13.xx to 14.xx , during upgrade , an already running nginx Pod got restarted during upgrade. Is this expected behavior?

Can we have Kubernetes cluster upgrades without user pods restarting?

Which tool supports un-intruppted upgrades?

What are the downtimes that we can never aviod? ( apart from Control plane )

-- Ijaz Ahmad Khan
kubernetes
rancher
rancher-rke

2 Answers

1/8/2020

Resolved By configuring the container runtime on the hosts not to restart containers on docker restart.

-- Ijaz Ahmad Khan
Source: StackOverflow

7/17/2019

The default way Kubernetes upgrades is by doing a rolling upgrade of the nodes, one at a time.

This works by draining and cordoning (marking the node as unavailable for new deployments) each node that is being upgraded so that there no pods running on that node.

It does that by creating a new revision of the existing pods on another node (if it's available) and when the new pod starts running (and answering to the readiness/health probes), it stops and remove the old pod (sending SIGTERM to each pod container) on the node that was being upgraded.

The amount of time Kubernetes waits for the pod to graceful shutdown, is controlled by the terminationGracePeriodSeconds on the pod spec, if the pod takes longer than that, they are killed with SIGKILL.

The point is, to have a graceful Kubernetes upgrade, you need to have enough nodes available, and your pods must have correct liveness and readiness probes (https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/).

Some interesting material that is worth a read:

https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime (specific to GKE but has some insights)
https://blog.gruntwork.io/zero-downtime-server-updates-for-your-kubernetes-cluster-902009df5b33

-- JCM
Source: StackOverflow