I have a GKE cluster which has been running fine up until recently. Now I see a whole bunch of Kubernetes Workloads showing as offline with the following error msg:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m23s default-scheduler Warning Failed 5m39s (x3 over 6m22s) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq Error: ErrImagePull
Normal BackOff 5m2s (x7 over 6m21s) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq Back-off pulling image "us.gcr.io/project/poc-app-bot@sha256:b99b5fb1b77407ade49d9bf42a94919e90422fee26c1a46ec6247370bd96c4d8"
Normal Pulling 4m49s (x4 over 6m22s) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq pulling image "us.gcr.io/project/poc-app-bot@sha256:b99b5fb1b77407ade49d9bf42a94919e90422fee26c1a46ec6247370bd96c4d8"
Warning Failed 81s (x22 over 6m21s) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq Error: ImagePullBackOff
Not sure what could have changed to cause this issue.
This is the ouput of kubectl
Name: project-5dddbd66b5-vpw8q
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-platsol-bots-staging-default-pool-f489f2f3-rjrq/10.x.x.x
Start Time: Wed, 18 Sep 2019 16:48:23 +0100
Labels: app=bot
pod-template-hash=5dddbd66b5
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container project
Status: Pending
IP: 10.20.1.9
Controlled By: ReplicaSet/bot-5dddbd66b5
Containers:
project:
Container ID:
Image: us.gcr.io/project/project@sha256:b99b5fb1b77407ade49d9bf42a94919e90422fee26c1a46ec6247370bd96c4d8
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-99cns:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-99cns
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 4m38s (x793 over 3h4m) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq Error: ImagePullBackOff
Below is what i have in my YAML definition for the deployment. I have not defined a secret as one was not required to pull the image from Google Container Registry,
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
kubectl.kubernetes.io/last-applied-configuration: |
<redacted annotations>
creationTimestamp: 2019-06-06T08:37:01Z
generation: 3
labels:
app: project
name: bot
namespace: default
resourceVersion: "68945490"
selfLink: /apis/apps/v1/namespaces/default/deployments/bot
uid: 412ce711-8836-11e9-905f-42010a8e016c
image: us.gcr.io/project/app-bot@sha256:b99b5fb1b77407ade49d9bf42a94919e90422fee26c1a46ec6247370bd96c4d8
imagePullPolicy: IfNotPresent
Okay so I followed this guide to patch the service account with a "secret" when pulling images from GCR https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
I SSH onto a single node and can pull an image for one Application successfully,
vinay@cloudshell:~ (project-id)$ docker pull us.gcr.io/project-id/project2-bot@sha256:9817462c743a93bb9206e4b8685
5322f731a768dca18e26b8bfc39b0cc886d31
sha256:9817462c743a93bb9206e4b86855322f731a768dca18e26b8bfc39b0cc886d31: Pulling from project-id/project2-bot
092586df9206: Pull complete
ef599477fae0: Pull complete
4530c6472b5d: Pull complete
d34d61487075: Pull complete
272f46008219: Pull complete
12ff6ccfe7a6: Pull complete
f26b99e1adb1: Pull complete
bb50901cd579: Pull complete
64a286652062: Pull complete
283785ced197: Pull complete
ed5a2062edd6: Pull complete
Digest: sha256:9817462c743a93bb9206e4b86855322f731a768dca18e26b8bfc39b0cc886d31
Status: Downloaded newer image for us.gcr.io/project-id/project2-bot@sha256:9817462c743a93bb9206e4b86855322f731a768dca18e26b8
bfc39b0cc886d31
us.gcr.io/project-id/project2-bot@sha256:9817462c743a93bb9206e4b86855322f731a768dca18e26b8bfc39b0cc886d31
But this application seems to throw an error,
vinay@cloudshell:~ (project-id)$ docker pull us.gcr.io/project-id/project1-plug@sha256:c53ac1c536a1187ce940f9221730cc0eae3103f4313033659e2162a70bc66c59
sha256:c53ac1c536a1187ce940f9221730cc0eae3103f4313033659e2162a70bc66c59: Pulling from project-id/project1-plug
a4d8138d0f6b: Pulling fs layer
dbdc36973392: Pulling fs layer
f59d6d019dd5: Pulling fs layer
aaef3e026258: Waiting
5e86b04a4500: Waiting
1a6643a2873a: Waiting
2ad1e30fc17c: Waiting
ddb5baaf3393: Waiting
0a7edc889b3c: Waiting
31a1f16c256b: Waiting
172a500f7b4d: Waiting
error pulling image configuration: unknown blob
ErrImagePull is quite possibly the most common, and is fortunately straightforward to debug and diagnose. You'll see ErrImagePull as the status message when this occurs, indicating that Kubernetes was not able to retrieve the image you specified in the manifest (maybe the image was deleted from the register).
You can immediately get more detailed information about why this error occurred using the kubectl describe [pod]
command. It's not entirely an error condition, as Kubernetes is technically in a waiting state hoping that the image will become available