I am running cluster of 3 nodes, created using kops, on AWS EC2 nodes.
When I run the deployment
kubectl set image deployment.v1.apps/app-web app-web=11122333.dkr.ecr.eu-west-1.amazonaws.com/app:$build_number
I get the newly created pod stuck in Pending
whilst existing 3 stay in Running
mode indefinitely.
kubectl describe pod app-web-7b44bb94f6-ftbfg
gives:
Warning FailedScheduling 81s (x11 over 8m25s) default-scheduler 0/6 nodes are available: 3 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't satisfy existing pods anti-affinity rules, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
kubectl describe pod app-web-7b44bb94f6-ftbfg
gives
app-web-7b44bb94f6-ftbfg 0/1 Pending 0 9m15s
app-web-c977b7ff9-6ngr2 1/1 Running 0 12h
app-web-c977b7ff9-9lj9c 1/1 Running 0 12h
app-web-c977b7ff9-klrnv 1/1 Running 0 12h
I've recently upgraded my cluster from:
# kubectl version --short
Client Version: v1.17.3
Server Version: v1.11.9
to
Client Version: v1.17.3
Server Version: v1.18.10
I've had to upgrade my deployment as well.
apiVersion: apps/v1 # Previously: extensions/v1beta1
kind: Deployment
metadata:
name: app-web
spec:
replicas: 3
selector: # This selector had to be added once migrated to apps/v1
matchLabels:
app: app-web
role: web
template:
metadata:
labels:
app: app-web
role: web
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- app-web
topologyKey: "kubernetes.io/hostname"
The problem seem to be related to: podAntiAffinity
, however it wasn't a problem in a previous version of a Kubernetes. Shall I switch to preferredDuringSchedulingIgnoredDuringExecution
instead or perhaps there is another solution to this problem?
The issue is that the default maxUnavailable
for RollingUpdate
strategy is 25%
, which doesn't allow Kubernetes to evict 1 out of 3 pods. I've added this to deployment strategy spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
and things are back to normal.
I haven't analyzed whether that default 25%
has changed, since this issue wasn't affecting me in the previous version of the Kubernetes.