I am trying to resize a replication controller from 2 to 0, the two pods that are to be deleted are scheduled on node1 and node2 respectively. The pod on node2 gets deleted without a problem, but the one on node1 stays active and running according to both kubectl get pods
and docker ps
kubectl scale rc my-app-v1 --replicas=0
kubectl get rc my-app-v1
# waited several minutes
kubectl get pods -l app=my-app
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
my-app-v1 my-app <docker image> <selector> 0
NAME READY STATUS RESTARTS AGE NODE
my-app-v1-a12da 1/1 Running 0 5d node1
One of the two pods got deleted properly, while the other remains running. I have tried this several times and have consistently had problems with only node1.
I ssh'ed into node1 and restarted kubelet. This deleted the pod that was lingering around, but When I try to delete another pod on that node I still have to restart kubelet to get it to work.
I think the kubelet loop is stuck somewhere and only makes it through a few iterations before getting stuck.
I just turned on verbose logging, but I'm not sure what I should look for.
This also applies to containers scheduled to node1. Their images are never pulled, nor are they started.
node1 has worked in the past and I just started running into this problem last night
admin@node1 ~ $ /opt/bin/kubelet --version=true
Kubernetes v1.1.1
kubectl version
Client Version: version.Info{Major:"1", Minor:"0", GitVersion:"v1.0.6", GitCommit:"388061f00f0d9e4d641f9ed4971c775e1654579d", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.1", GitCommit:"92635e23dfafb2ddc828c8ac6c03c7a7205a84d8", GitTreeState:"clean"}
8154 config.go:382] Receiving a new pod "my-app-v1-a12da_default"
...
8154 server.go:944] GET /stats/default/my-app-v1-a12da/<some uuid>/app-container: (75.513µs) 404 [[Go 1.1 package http]
Normally the SyncLoop would pick this up and do the necessary docker operations to get the container started. But there is no sync loop activity after "Receiving new pod" and there hasn't been for the past 50 minutes since I restarted kubelet.
As pointed out by @yu-ju-hong, this was due to a bug in Kubernetes 1.1.1 in handling version-skewed clusters. Please upgrade the master to a newer version, such as Kubernetes 1.1.7, and, ideally, upgrade the nodes to the same version as soon as possible.