elasticsearch StatefulSet pod failed to mount volume

7/31/2019

I have Elasticsearch cluster of 3 nodes (StatefulSet) running on EKS (Server Version: v1.13.7-eks-c57ff8) using Persistent Volumes.

I performed EKS cluster upgrade from 1.12 to 1.13, which was successful. But one of the elasticsearch cluster node failed to start and stuck in init state :

NAME                                 READY   STATUS     RESTARTS   AGE
es-master-0                          0/1     Init:0/3   0          15h
es-master-1                          1/1     Running    0          44h
es-master-2                          1/1     Running    0          44h

I tried to kill the pod es-master-0 but new pods again stuck in the same state.

When I check the pod deployment (kubectl describe pod es-master-0), I noticed pods is not able to mount the volume :

    Events:
  Type     Reason                  Age    From                                                Message
  ----     ------                  ----   ----                                                -------
  Normal   Scheduled               2m13s  default-scheduler                                   Successfully assigned kube-logging/es-master-0 to ip-10-2-18-16.us-west-2.compute.internal
  Normal   SuccessfulAttachVolume  2m10s  attachdetach-controller                             AttachVolume.Attach succeeded for volume "pvc-f2e27430-af11-11e9-b10d-02a8eba067e2"
  Warning  FailedMount             10s    kubelet, ip-10-2-18-16.us-west-2.compute.internal  Unable to mount volumes for pod "es-master-0_kube-logging(bc27e29c-b539-11e9-9958-06eeabb0603e)": timeout expired waiting for volumes to attach or mount for pod "kube-logging"/"es-master-0". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-bz6w9]

Output of kubectl get pv :

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                           STORAGECLASS   REASON   AGE
pvc-06cd5cfe-af12-11e9-b10d-02a8eba067e2   100Gi      RWO            Retain           Bound    kube-logging/data-es-master-1   aws-gp2                 7d19h
pvc-178b5aba-af12-11e9-b10d-02a8eba067e2   100Gi      RWO            Retain           Bound    kube-logging/data-es-master-2   aws-gp2                 7d19h
pvc-f2e27430-af11-11e9-b10d-02a8eba067e2   100Gi      RWO            Retain           Bound    kube-logging/data-es-master-0   aws-gp2                 7d19h

Output of kubectl get pvc:

NAME               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-es-master-0   Bound    pvc-f2e27430-af11-11e9-b10d-02a8eba067e2   100Gi      RWO            aws-gp2        7d19h
data-es-master-1   Bound    pvc-06cd5cfe-af12-11e9-b10d-02a8eba067e2   100Gi      RWO            aws-gp2        7d19h
data-es-master-2   Bound    pvc-178b5aba-af12-11e9-b10d-02a8eba067e2   100Gi      RWO            aws-gp2        7d19h

I also tried to rebooting the node on which this pod is schedule.

This is my manifest file :

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: es-master
  namespace: kube-logging
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
        resources:
            limits:
              cpu: 1000m
              memory: 2.5G
            requests:
              cpu: 100m
        ports:
        - containerPort: 9200
          name: rest
          protocol: TCP
        - containerPort: 9300
          name: inter-node
          protocol: TCP
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        env:
          - name: cluster.name
            value: prod-eks-logs
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: node.name
            value: "$(NODE_NAME).elasticsearch"
          - name: discovery.zen.ping.unicast.hosts
            value: "es-master-0.elasticsearch,es-master-1.elasticsearch,es-master-2.elasticsearch"
          - name: cluster.initial_master_nodes
            value: "es-master-0.elasticsearch,es-master-1.elasticsearch,es-master-2.elasticsearch"
          - name: discovery.zen.minimum_master_nodes
            value: "2"
          - name: ES_JAVA_OPTS
            value: "-Xmx1g -Xmx1g"
      initContainers:
      - name: fix-permissions
        image: busybox
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      - name: increase-fd-ulimit
        image: busybox
        command: ["sh", "-c", "ulimit -n 65536"]
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        app: elasticsearch
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: aws-gp2
      resources:
        requests:
          storage: 100Gi

Any help on how can I pass this elasticsearch state ?

-- roy
amazon-eks
aws-eks
elasticsearch
kubernetes

0 Answers