My Kubernetes pods and containers are not starting. They are stuck in with the status ContainerCreating
.
I ran the command kubectl describe po PODNAME
, which lists the events and I see the following error:
Type Reason Message
Warning FailedSync Error syncing pod
Normal SandboxChanged Pod sandbox changed, it will be killed and re-created.
The Count
column indicates that these errors are being repeated over and over again, roughly once a second. The full output is below from this command is below, but how do I go about debugging this? I'm not even sure what these errors mean.
Name: ocr-extra-2939512459-3hkv1
Namespace: ocr-da-cluster
Node: gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2/10.240.0.11
Start Time: Tue, 24 Oct 2017 21:05:01 -0400
Labels: component=ocr
pod-template-hash=2939512459
role=extra
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"ocr-da-cluster","name":"ocr-extra-2939512459","uid":"d58bd050-b8f3-11e7-9f9e-4201...
Status: Pending
IP:
Created By: ReplicaSet/ocr-extra-2939512459
Controlled By: ReplicaSet/ocr-extra-2939512459
Containers:
ocr-node:
Container ID:
Image: us.gcr.io/ocr-api/ocr-image
Image ID:
Ports: 80/TCP, 443/TCP, 5555/TCP, 15672/TCP, 25672/TCP, 4369/TCP, 11211/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 31
memory: 10Gi
Liveness: http-get http://:http/ocr/live delay=270s timeout=30s period=60s #success=1 #failure=5
Readiness: http-get http://:http/_ah/warmup delay=180s timeout=60s period=120s #success=1 #failure=3
Environment:
NAMESPACE: ocr-da-cluster (v1:metadata.namespace)
Mounts:
/var/log/apache2 from apachelog (rw)
/var/log/celery from cellog (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-dhjr5 (ro)
log-apache2-error:
Container ID:
Image: busybox
Image ID:
Port: <none>
Args:
/bin/sh
-c
echo Apache2 Error && sleep 90 && tail -n+1 -F /var/log/apache2/error.log
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 20m
Environment: <none>
Mounts:
/var/log/apache2 from apachelog (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-dhjr5 (ro)
log-worker-1:
Container ID:
Image: busybox
Image ID:
Port: <none>
Args:
/bin/sh
-c
echo Celery Worker && sleep 90 && tail -n+1 -F /var/log/celery/worker*.log
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 20m
Environment: <none>
Mounts:
/var/log/celery from cellog (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-dhjr5 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
apachelog:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
cellog:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-dhjr5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-dhjr5
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/instance-type=n1-highcpu-32
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
10m 10m 2 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (10), Insufficient memory (2), MatchNodeSelector (2).
10m 10m 1 default-scheduler Normal Scheduled Successfully assigned ocr-extra-2939512459-3hkv1 to gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2
10m 10m 1 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "apachelog"
10m 10m 1 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "cellog"
10m 10m 1 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-dhjr5"
10m 1s 382 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Warning FailedSync Error syncing pod
10m 0s 382 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SandboxChanged Pod sandbox changed, it will be killed and re-created.
Are you sure you need 31 cpu as initial request (ocr-node)?
This will require a very big node.
I'm seeing similar issues with some of my pods. Deleting them and allowing them to be recreated sometimes helps. Not consistent. I'm sure there is enough resources available.
See Kubernetes pods failing on "Pod sandbox changed, it will be killed and re-created"
Check your resource limits. I faced the same issue and the reason in my case was because I was using m
instead of Mi
for memory limits and memory requests.