Kubernetes pods hanging in Init state

4/28/2018

I am facing a weird issue with my pods. I am launching around 20 pods in my env and every time some random 3-4 pods out of them hang with Init:0/1 status. On checking the status of pod, Init container shows running status, which should terminate after task is finished, and app container shows Waiting/Pod Initializing stage. Same init container image and specs are being used in across all 20 pods but this issue is happening with some random pods every time. And on terminating these stuck pods, it stucks in Terminating state. If i ssh on node at which this pod is launched and run docker ps, it shows me init container in running state but on running docker exec it throws error that container doesn't exist. This init container is pulling configs from Consul Server and on checking volume (got from docker inspect), i found that it has pulled all the key-val pairs correctly and saved it in defined file name. I have checked resources on all the nodes and more than enough is available on all.

Below is detailed example of on the pod acting like this.

Kubectl Version :

kubectl version 
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} 
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} 

Pods :

kubectl get pods -n dev1|grep -i session-service 
session-service-app-75c9c8b5d9-dsmhp               0/1       Init:0/1           0          10h 
session-service-app-75c9c8b5d9-vq98k               0/1       Terminating        0          11h 

Pods Status :

kubectl describe pods session-service-app-75c9c8b5d9-dsmhp -n dev1 
Name:           session-service-app-75c9c8b5d9-dsmhp 
Namespace:      dev1 
Node:           ip-192-168-44-18.ap-southeast-1.compute.internal/192.168.44.18 
Start Time:     Fri, 27 Apr 2018 18:14:43 +0530 
Labels:         app=session-service-app 
                pod-template-hash=3175746185 
                release=session-service-app 
Status:         Pending 
IP:             100.96.4.240 
Controlled By:  ReplicaSet/session-service-app-75c9c8b5d9 
Init Containers: 
  initpullconsulconfig: 
    Container ID:  docker://c658d59995636e39c9d03b06e4973b6e32f818783a21ad292a2cf20d0e43bb02 
    Image:         shr-u-nexus-01.myops.de:8082/utils/app-init:1.0 
    Image ID:      docker-pullable://shr-u-nexus-01.myops.de:8082/utils/app-init@sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd 
    Port:          <none> 
    Args: 
      -consul-addr=consul-server.consul.svc.cluster.local:8500 
    State:          Running 
      Started:      Fri, 27 Apr 2018 18:14:44 +0530 
    Ready:          False 
    Restart Count:  0 
    Environment: 
      CONSUL_TEMPLATE_VERSION:  0.19.4 
      POD:                      sand 
      SERVICE:                  session-service-app 
      ENV:                      dev1 
    Mounts: 
      /var/lib/app from shared-volume-sidecar (rw) 
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro) 
Containers: 
  session-service-app: 
    Container ID: 
    Image:          shr-u-nexus-01.myops.de:8082/sand-images/sessionservice-init:sitv12 
    Image ID: 
    Port:           8080/TCP 
    State:          Waiting 
      Reason:       PodInitializing 
    Ready:          False 
    Restart Count:  0 
    Environment:    <none> 
    Mounts: 
      /etc/appenv from shared-volume-sidecar (rw) 
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro) 
Conditions: 
  Type           Status 
  Initialized    False 
  Ready          False 
  PodScheduled   True 
Volumes: 
  shared-volume-sidecar: 
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime) 
    Medium: 
  default-token-bthkv: 
    Type:        Secret (a volume populated by a Secret) 
    SecretName:  default-token-bthkv 
    Optional:    false 
QoS Class:       BestEffort 
Node-Selectors:  <none> 
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s 
                 node.kubernetes.io/unreachable:NoExecute for 300s 
Events:          <none> 

Container Status on Node :

sudo docker ps|grep -i session 
c658d5999563        shr-u-nexus-01.myops.de:8082/utils/app-init@sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd                                       "/usr/bin/consul-t..."   10 hours ago        Up 10 hours                             k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0 

da120abd3dbb        gcr.io/google_containers/pause-amd64:3.0                                                                                                                      "/pause"                 10 hours ago        Up 10 hours                             k8s_POD_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0 

f53d48c7d6ec        shr-u-nexus-01.myops.de:8082/utils/app-init@sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd                                       "/usr/bin/consul-t..."   10 hours ago        Up 10 hours                             k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0 

c26415458d39        gcr.io/google_containers/pause-amd64:3.0                                                                                                                      "/pause"                 10 hours ago        Up 10 hours                             k8s_POD_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0 

On running Docker exec (same result with kubectl exec) :

sudo docker exec -it c658d5999563 bash 
rpc error: code = 2 desc = containerd: container not found 
-- Vivek Kumar
docker
kubernetes

0 Answers