Elasticsearch kubernetes pods failing with crashloopbackoff - Back-off restarting failed container

6/16/2019

I am using these steps https://vocon-it.com/2019/03/04/kubernetes-9-installing-elasticsearch-using-helm-charts/ to install elasticsearch using helm charts. I am using these elasticsearch helm charts - https://github.com/helm/charts/tree/master/stable/elasticsearch

ade-db2-6c56bc6dfd-jfnw4                                       0/1     ImagePullBackOff        0          30m
augmented-data-explorer-759b9bd96-jds24                        1/1     Running                 0          30m
augmented-data-explorer-elasticsearch-client-9f8c7984b-gz4kd   0/1     Init:CrashLoopBackOff   10         30m
augmented-data-explorer-elasticsearch-data-0                   0/1     Init:CrashLoopBackOff   10         30m
augmented-data-explorer-elasticsearch-master-0                 0/1     Init:CrashLoopBackOff   10         30m
 [root@dv-demo4-master-1 ~]# kubectl -n zen describe pod augmented-data-explorer-elasticsearch-data 
Name:               augmented-data-explorer-elasticsearch-data-0
Namespace:          zen
Priority:           0
PriorityClassName:  <none>
Node:               172.16.196.167/172.16.196.167
Start Time:         Sun, 16 Jun 2019 10:31:56 -0700
Labels:             app=elasticsearch
                    component=data
                    controller-revision-hash=augmented-data-explorer-elasticsearch-data-7fbd495c9f
                    release=augmented-data-explorer
                    role=data
                    statefulset.kubernetes.io/pod-name=augmented-data-explorer-elasticsearch-data-0
Annotations:        kubernetes.io/psp: augmented-data-explorer-elasticsearch
Status:             Pending
IP:                 10.1.213.210
Controlled By:      StatefulSet/augmented-data-explorer-elasticsearch-data
Init Containers:
  sysctl:
    Container ID:  docker://42be82c2aedb8971383b1d0ce9aa23f65b63294be1f4b1dd2addffa13f2173b3
    Image:         busybox:latest
    Image ID:      docker-pullable://busybox@sha256:7a4d4ed96e15d6a3fe8bfedb88e95b153b93e230a96906910d57fc4a13210160
    Port:          <none>
    Host Port:     <none>
    Command:
      sysctl
      -w
      vm.max_map_count=262144
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 16 Jun 2019 16:43:18 -0700
      Finished:     Sun, 16 Jun 2019 16:43:18 -0700
    Ready:          False
    Restart Count:  77
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from augmented-data-explorer-elasticsearch-data-token-mqtb5 (ro)
  chown:
    Container ID:  
    Image:         docker.elastic.co/elasticsearch/elasticsearch-oss:6.7.0
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      set -e; set -x; chown elasticsearch:elasticsearch /usr/share/elasticsearch/data; for datadir in $(find /usr/share/elasticsearch/data -mindepth 1 -maxdepth 1 -not -name ".snapshot"); do
        chown -R elasticsearch:elasticsearch $datadir;
      done; chown elasticsearch:elasticsearch /usr/share/elasticsearch/logs; for logfile in $(find /usr/share/elasticsearch/logs -mindepth 1 -maxdepth 1 -not -name ".snapshot"); do
        chown -R elasticsearch:elasticsearch $logfile;
      done

    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/elasticsearch/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from augmented-data-explorer-elasticsearch-data-token-mqtb5 (ro)
Containers:
  elasticsearch:
    Container ID:   
    Image:          docker.elastic.co/elasticsearch/elasticsearch-oss:6.7.0
    Image ID:       
    Port:           9300/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:  1
    Requests:
      cpu:      25m
      memory:   1536Mi
    Readiness:  http-get http://:9200/_cluster/health%3Flocal=true delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      DISCOVERY_SERVICE:           augmented-data-explorer-elasticsearch-discovery
      NODE_MASTER:                 false
      PROCESSORS:                  1 (limits.cpu)
      ES_JAVA_OPTS:                -Djava.net.preferIPv4Stack=true -Xms1536m -Xmx1536m  
      EXPECTED_MASTER_NODES:       1
      MINIMUM_MASTER_NODES:        1
      RECOVER_AFTER_MASTER_NODES:  1
      bootstrap.memory_lock:       true
    Mounts:
      /post-start-hook.sh from config (rw)
      /pre-stop-hook.sh from config (rw)
      /usr/share/elasticsearch/config/elasticsearch.yml from config (rw)
      /usr/share/elasticsearch/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from augmented-data-explorer-elasticsearch-data-token-mqtb5 (ro)
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-augmented-data-explorer-elasticsearch-data-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      augmented-data-explorer-elasticsearch
    Optional:  false
  augmented-data-explorer-elasticsearch-data-token-mqtb5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  augmented-data-explorer-elasticsearch-data-token-mqtb5
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                      From                     Message
  ----     ------   ----                     ----                     -------
  Normal   Pulled   42m (x70 over 6h12m)     kubelet, 172.16.196.167  Successfully pulled image "busybox:latest"
  Normal   Pulling  37m (x71 over 6h12m)     kubelet, 172.16.196.167  pulling image "busybox:latest"
  Warning  BackOff  117s (x1713 over 6h12m)  kubelet, 172.16.196.167  Back-off restarting failed container

The pvc are bound - kubectl -n zen get pvc

data-augmented-data-explorer-elasticsearch-data-0     Bound    pvc-97fab8fa-905c-11e9-9850-00163e01c3eb   30Gi       RWO            oketi-gluster   176m
data-augmented-data-explorer-elasticsearch-master-0   Bound    pvc-97fbb612-905c-11e9-9850-00163e01c3eb   4Gi        RWO            oketi-gluster   176m

This is my PVC for elasticsearch-data

kind: PersistentVolumeClaim
metadata:
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: {{ template "elasticsearch.name" . }}
    component: data
    release: {{ .Release.Name }}
  name: {{ .Values.adeElasticSearchDataPVC.name }}
  namespace: {{ toYaml .Values.namespace }}
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: oketi-gluster
  resources:
    requests:
      storage: 30Gi
  volumeMode: Filesystem

This is PVC for elasticsearch-master

kind: PersistentVolumeClaim
metadata:
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: {{ template "elasticsearch.name" . }}
    component: master
    release: {{ .Release.Name }}
    role: master
  name: {{ .Values.adeElasticSearchMasterPVC.name }}
  namespace:  {{ toYaml .Values.namespace }}
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: oketi-gluster
  resources:
    requests:
      storage: 4Gi
  volumeMode: Filesystem

I do not find anything useful in the log

kubectl -n zen log augmented-data-explorer-elasticsearch-client-9f8c7984b-gz4kd
Error from server (BadRequest): container "elasticsearch" in pod "augmented-data-explorer-elasticsearch-client-9f8c7984b-gz4kd" is waiting to start: PodInitializing

Any idea on how to fix this issue or how I can debug it?

-- Nidhi
elasticsearch
kubernetes
kubernetes-helm
kubernetes-pod

1 Answer

6/18/2019

As @Nidhi mentioned in the comments, the issue with bootstrapping elasticserach containers have been solved by adjusting particular vm.max_map_count virtual memory limit in /etc/sysctl.conf inside relevant ES container (source documentation here), and using mlockall function to prevent memory allocation from being swapped out by granting ES user lock memory permission.

This can be adjusted by modifying /etc/security/limits.conf, for example:

elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
-- mk_sta
Source: StackOverflow