pods fail with FailedSync reason and no explanation

9/29/2017

I have a cluster on google cloud container engine with 6 n1-standard-1 machine.

I deployed several services and pod on this cluster and sometime they fail with the only reason FailedSync and no more explanation, I have no idea why they fail. Virtual machine are not overloaded, only 6% of the CPU is used and less than 1Gi of memory.

Here some events from describe command :

events 1 enter image description here

pods filter by is system object: true have the same problem, some of them have more than 900 restarts in 4 days...

enter image description here

I miss maybe something in my kubernetes configuration and I have no idea what...

Thanks for your help

-- manu
google-kubernetes-engine
kubernetes

2 Answers

11/4/2017

I finally found the reason of the node failures. I use a glusterfs volume with the https://eventstore.org/ database and I think the latency make it fails, I saw lot of slow queries in the eventstore logs. I don't really know what happen but since I use a persistent ssd disk in the same region of my cluster I have no issue. 0 restart since several days and nodes work like a charm.

I also isolated this database on a single node.

-- manu
Source: StackOverflow

10/3/2017

I think the best way to find out the issue is just ssh to the node and use sudo docker logs $CONTAINER_Id to see what happened to your applications.

You can tell on what nodes your applications are deployed to by kubectl describe po $PO_NAME or simply kubectl get po -o wide.

-- Jimmy Lu
Source: StackOverflow