K
Q

Question

Kubelet SyncLoop stops (v1.1.1)

12/10/2015

I am trying to resize a replication controller from 2 to 0, the two pods that are to be deleted are scheduled on node1 and node2 respectively. The pod on node2 gets deleted without a problem, but the one on node1 stays active and running according to both kubectl get pods and docker ps

symptoms:

kubectl scale rc my-app-v1 --replicas=0
kubectl get rc my-app-v1
# waited several minutes
kubectl get pods -l app=my-app

output:

CONTROLLER   CONTAINER(S)   IMAGE(S)           SELECTOR     REPLICAS
my-app-v1    my-app         <docker image>     <selector>   0

NAME                 READY     STATUS    RESTARTS   AGE  NODE
my-app-v1-a12da      1/1       Running   0          5d   node1

One of the two pods got deleted properly, while the other remains running. I have tried this several times and have consistently had problems with only node1.

What I have tried to fix it:

I ssh'ed into node1 and restarted kubelet. This deleted the pod that was lingering around, but When I try to delete another pod on that node I still have to restart kubelet to get it to work.

I think the kubelet loop is stuck somewhere and only makes it through a few iterations before getting stuck.

I just turned on verbose logging, but I'm not sure what I should look for.

Update

This also applies to containers scheduled to node1. Their images are never pulled, nor are they started.

node1 has worked in the past and I just started running into this problem last night

Kubelet Version

admin@node1 ~ $ /opt/bin/kubelet --version=true
Kubernetes v1.1.1

Kubectl version

kubectl version
Client Version: version.Info{Major:"1", Minor:"0", GitVersion:"v1.0.6", GitCommit:"388061f00f0d9e4d641f9ed4971c775e1654579d", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.1", GitCommit:"92635e23dfafb2ddc828c8ac6c03c7a7205a84d8", GitTreeState:"clean"}

Log Excerpts: where is SyncLoop?

8154 config.go:382] Receiving a new pod "my-app-v1-a12da_default"
...
8154 server.go:944] GET /stats/default/my-app-v1-a12da/<some uuid>/app-container: (75.513µs) 404 [[Go 1.1 package http]

Normally the SyncLoop would pick this up and do the necessary docker operations to get the container started. But there is no sync loop activity after "Receiving new pod" and there hasn't been for the past 50 minutes since I restarted kubelet.

-- esecules

kubernetes

1 Answer

2/2/2016

As pointed out by @yu-ju-hong, this was due to a bug in Kubernetes 1.1.1 in handling version-skewed clusters. Please upgrade the master to a newer version, such as Kubernetes 1.1.7, and, ideally, upgrade the nodes to the same version as soon as possible.

-- briangrant

Source: StackOverflow

KQ