Tiller pod crashes after Vagrant VM is powered off

5/23/2018

I have set up a Vagrant VM, and installed Kubernetes and Helm.

vagrant@vagrant:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-19T00:05:56Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

vagrant@vagrant:~$ helm version
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}

After the first vagrant up that creates the VM, Tiller has no issues.

I power-off the VM with vagrant halt and reactivate it with vagrant up. Then Tiller starts to misbehave.

It has a lot of restarts and at some point, it enters a ClashLoopBackOff state.

etcd-vagrant                            1/1       Running            2          1h
heapster-5449cf95bd-h9xk8               1/1       Running            2          1h
kube-apiserver-vagrant                  1/1       Running            2          1h
kube-controller-manager-vagrant         1/1       Running            2          1h
kube-dns-6f4fd4bdf-xclbb                3/3       Running            6          1h
kube-proxy-8n8tc                        1/1       Running            2          1h
kube-scheduler-vagrant                  1/1       Running            2          1h
kubernetes-dashboard-5bd6f767c7-lrdjp   1/1       Running            3          1h
tiller-deploy-78f96d6f9-cswbm           0/1       CrashLoopBackOff   8          38m
weave-net-948jt                         2/2       Running            5          1h

I get a look at the pod's events and see that the Liveness and Readiness probes are failing.

vagrant@vagrant:~$ kubectl describe pod tiller-deploy-78f96d6f9-cswbm -n kube-system
Name:           tiller-deploy-78f96d6f9-cswbm
Namespace:      kube-system
Node:           vagrant/10.0.2.15
Start Time:     Wed, 23 May 2018 08:51:54 +0000
Labels:         app=helm
                name=tiller
                pod-template-hash=349528295
Annotations:    <none>
Status:         Running
IP:             10.32.0.28
Controlled By:  ReplicaSet/tiller-deploy-78f96d6f9
Containers:
  tiller:
    Container ID:   docker://389470b95c46f0a5ba6b4b5457f212b0e6f3e3a754beb1aeae835260de3790a7
    Image:          gcr.io/kubernetes-helm/tiller:v2.9.1
    Image ID:       docker-pullable://gcr.io/kubernetes-helm/tiller@sha256:417aae19a0709075df9cc87e2fcac599b39d8f73ac95e668d9627fec9d341af2
    Ports:          44134/TCP, 44135/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Wed, 23 May 2018 09:26:53 +0000
      Finished:     Wed, 23 May 2018 09:27:12 +0000
    Ready:          False
    Restart Count:  8
    Liveness:       http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
    Environment:
      TILLER_NAMESPACE:    kube-system
      TILLER_HISTORY_MAX:  0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-fl44z (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  default-token-fl44z:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-fl44z
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Normal   SuccessfulMountVolume   38m                 kubelet, vagrant   MountVolume.SetUp succeeded for volume "default-token-fl44z"
  Normal   Scheduled               38m                 default-scheduler  Successfully assigned tiller-deploy-78f96d6f9-cswbm to vagrant
  Normal   Pulled                  29m (x2 over 38m)   kubelet, vagrant   Container image "gcr.io/kubernetes-helm/tiller:v2.9.1" already present on machine
  Normal   Killing                 29m                 kubelet, vagrant   Killing container with id docker://tiller:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Created                 29m (x2 over 38m)   kubelet, vagrant   Created container
  Normal   Started                 29m (x2 over 38m)   kubelet, vagrant   Started container
  Warning  Unhealthy               28m (x2 over 37m)   kubelet, vagrant   Readiness probe failed: Get http://10.32.0.19:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy               17m (x30 over 37m)  kubelet, vagrant   Liveness probe failed: Get http://10.32.0.19:44135/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Normal   SuccessfulMountVolume   11m                 kubelet, vagrant   MountVolume.SetUp succeeded for volume "default-token-fl44z"
  Warning  FailedCreatePodSandBox  10m (x7 over 11m)   kubelet, vagrant   Failed create pod sandbox.
  Normal   SandboxChanged          10m (x8 over 11m)   kubelet, vagrant   Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled                  10m                 kubelet, vagrant   Container image "gcr.io/kubernetes-helm/tiller:v2.9.1" already present on machine
  Normal   Created                 10m                 kubelet, vagrant   Created container
  Normal   Started                 10m                 kubelet, vagrant   Started container
  Warning  Unhealthy               10m                 kubelet, vagrant   Liveness probe failed: Get http://10.32.0.28:44135/liveness: dial tcp 10.32.0.28:44135: getsockopt: connection refused
  Warning  Unhealthy               10m                 kubelet, vagrant   Readiness probe failed: Get http://10.32.0.28:44135/readiness: dial tcp 10.32.0.28:44135: getsockopt: connection refused
  Warning  Unhealthy               8m (x2 over 9m)     kubelet, vagrant   Liveness probe failed: Get http://10.32.0.28:44135/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy               8m (x2 over 9m)     kubelet, vagrant   Readiness probe failed: Get http://10.32.0.28:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff                 1m (x22 over 7m)    kubelet, vagrant   Back-off restarting failed container

After entering this state, it stays there.

Only after I delete the Tiller pod, it comes up again and everything runs smoothly.

vagrant@vagrant:~$ kubectl get pods -n kube-system
NAME                                    READY     STATUS    RESTARTS   AGE
etcd-vagrant                            1/1       Running   2          1h
heapster-5449cf95bd-h9xk8               1/1       Running   2          1h
kube-apiserver-vagrant                  1/1       Running   2          1h
kube-controller-manager-vagrant         1/1       Running   2          1h
kube-dns-6f4fd4bdf-xclbb                3/3       Running   6          1h
kube-proxy-8n8tc                        1/1       Running   2          1h
kube-scheduler-vagrant                  1/1       Running   2          1h
kubernetes-dashboard-5bd6f767c7-lrdjp   1/1       Running   4          1h
tiller-deploy-78f96d6f9-tgx4z           1/1       Running   0          7m
weave-net-948jt                         2/2       Running   5          1h

However, the events seem to have the same Unhealthy Warnings.

Events:
  Type     Reason                 Age                From               Message
  ----     ------                 ----               ----               -------
  Normal   Scheduled              8m                 default-scheduler  Successfully assigned tiller-deploy-78f96d6f9-tgx4z to vagrant
  Normal   SuccessfulMountVolume  8m                 kubelet, vagrant   MountVolume.SetUp succeeded for volume "default-token-fl44z"
  Normal   Pulled                 7m                 kubelet, vagrant   Container image "gcr.io/kubernetes-helm/tiller:v2.9.1" already present on machine
  Normal   Created                7m                 kubelet, vagrant   Created container
  Normal   Started                7m                 kubelet, vagrant   Started container
  Warning  Unhealthy              7m                 kubelet, vagrant   Readiness probe failed: Get http://10.32.0.28:44135/readiness: dial tcp 10.32.0.28:44135: getsockopt: connection refused
  Warning  Unhealthy              7m                 kubelet, vagrant   Liveness probe failed: Get http://10.32.0.28:44135/liveness: dial tcp 10.32.0.28:44135: getsockopt: connection refused
  Warning  Unhealthy              1m (x6 over 3m)    kubelet, vagrant   Liveness probe failed: Get http://10.32.0.28:44135/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy              41s (x14 over 7m)  kubelet, vagrant   Readiness probe failed: Get http://10.32.0.28:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Any insight would be appreciated.

-- Kostas Demiris
kubernetes
kubernetes-helm
vagrant

0 Answers