For the past couple of days we have been experiencing an intermittent deployment failure when deploying (via Helm) to Kubernetes v1.11.2.
When it fails, kubectl describe <deployment>
usually reports that the container failed to create:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1s default-scheduler Successfully assigned default/pod-fc5c8d4b8-99npr to fh1-node04
Normal Pulling 0s kubelet, fh1-node04 pulling image "docker-registry.internal/pod:0e5a0cb1c0e32b6d0e603333ebb81ade3427ccdd"
Error from server (BadRequest): container "pod" in pod "pod-fc5c8d4b8-99npr" is waiting to start: ContainerCreating
and the only issue we can find in the kubelet logs is:
58468 kubelet_pods.go:146] Mount cannot be satisfied for container "pod", because the volume is missing or the volume mounter is nil: {Name:default-token-q8k7w ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}
58468 kuberuntime_manager.go:733] container start failed: CreateContainerConfigError: cannot find volume "default-token-q8k7w" to mount container start failed: CreateContainerConfigError: cannot find volume "default-token-q8k7w" to mount into container "pod"
It's intermittent which means it fails around once in every 20 or so deployments. Re-running the deployment works as expected.
The cluster and node health all look fine at the time of the deployment, so we are at a loss as to where to go from here. Looking for advice on where to start next on diagnosing the issue.
EDIT: As requested, the deployment file is generated via a Helm template and the output is shown below. For further information, the same Helm template is used for a lot of our services, but only this particular service has this intermittent issue:
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: pod
labels:
app: pod
chart: pod-0.1.0
release: pod
heritage: Tiller
environment: integration
annotations:
kubernetes.io/change-cause: https://github.com/path_to_release
spec:
replicas: 2
revisionHistoryLimit: 3
selector:
matchLabels:
app: pod
release: pod
environment: integration
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: pod
release: pod
environment: integration
spec:
containers:
- name: pod
image: "docker-registry.internal/pod:0e5a0cb1c0e32b6d0e603333ebb81ade3427ccdd"
env:
- name: VAULT_USERNAME
valueFrom:
secretKeyRef:
name: "pod-integration"
key: username
- name: VAULT_PASSWORD
valueFrom:
secretKeyRef:
name: "pod-integration"
key: password
imagePullPolicy: IfNotPresent
command: ['mix', 'phx.server']
ports:
- name: http
containerPort: 80
protocol: TCP
envFrom:
- configMapRef:
name: pod
livenessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 10
readinessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 10
resources:
limits:
cpu: 750m
memory: 200Mi
requests:
cpu: 500m
memory: 150Mi