Kubernetes - Daemonset Error Reason

6/21/2018

Have a k8s daemonset which is simply supposed to set sysctl -w vm.max_map_count=262144 on host nodes where the pod is deployed. The daemonset works as expected when the reosurce is applied the first time but, if the k8s nodes the daemonsets are running on are later restarted, the daemonset pods do not update the host's vm.max_map_count to 262144. The ds pods go into a running state but, on describe, they show:

State:          Running
  Started:      Thu, 21 Jun 2018 12:01:51 +0100
Last State:     Terminated
  Reason:       Error
  Exit Code:    143

I can't figure out the reason for the error however and I don't know where to look in order to resolve the issue?

daemonset yaml:

kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  name: ds-elk
  labels:
    app: elk
spec:
  template:
    metadata:
      labels:
        app: elk
    spec:
      hostPID: true
      containers:
        - name: startup-script
          image: gcr.io/google-containers/startup-script:v1
          imagePullPolicy: Always
          securityContext:
            privileged: true
          env:
          - name: STARTUP_SCRIPT
            value: |
              #! /bin/bash
              sysctl -w vm.max_map_count=262144
              echo done

Hosts are Red Hat EL 7.4. Kubernetes Server version 1.8.6.

kubectl describe pod ds-elk-5z5hs output:

Name:           ds-elk-5z5hs
Namespace:      default
Node:           xxx-00-xxxx-01v.devxxx.xxxxxx.xx.xx/xx.xxx.xx.xx
Start Time:     Tue, 15 May 2018 14:03:14 +0100
Labels:         app=elk
                controller-revision-hash=2068481183
                pod-template-generation=1
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"ds-elk","uid":"54372241-5840-11e8-aaaa-005056b97218","apiVersion"...
Status:         Running
IP:             xx.xxx.x.xxx
Controlled By:  DaemonSet/ds-elk
Containers:
  startup-script:
    Container ID:   docker://eff849b842ed7b28dcf07578301a12068c998cb42b59a88b2bf2e8243b72f419
    Image:          gcr.io/google-containers/startup-script:v1
    Image ID:       docker-pullable://gcr.io/google-containers/startup-script@sha256:be96df6845a2af0eb61b17817ed085ce41048e4044c541da7580570b61beff3e
    Port:           <none>
    State:          Running
      Started:      Thu, 21 Jun 2018 11:40:50 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Thu, 21 Jun 2018 07:24:56 +0100
      Finished:     Thu, 21 Jun 2018 11:39:22 +0100
    Ready:          True
    Restart Count:  2
    Environment:
      STARTUP_SCRIPT:  #! /bin/bash
sysctl -w vm.max_map_count=262144
echo done

    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ld98j (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
Volumes:
  default-token-ld98j:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-ld98j
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute
                 node.alpha.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
Events:          <none>
-- Going Bananas
elastic-stack
kubernetes

2 Answers

6/21/2018

The script uses /tmp/startup-script.kubernetes.io file as a mark that script has run once. This file is placed in your node /tmp directory not container. So the next time DaemonSet schedules the pod to this node the script just sleeps.

Here is the script to look at: https://github.com/kubernetes/contrib/blob/master/startup-script/manage-startup-script.sh

Please note that the image you are referring to is not build from exactly this version of code. In particular it doesn't use md5sum suffix to allow script run after the change of the script's code.

-- lexsys
Source: StackOverflow

6/25/2018

Ended up getting rid of the daemonset altogether and instead setting vm.max_map_count in the pod's initContainers spec:

  initContainers:
  - name: "sysctl"
    image: "busybox"
    imagePullPolicy: "Always"
    command: ["sysctl", "-w", "vm.max_map_count=262144"]
    securityContext:
      privileged: true
-- Going Bananas
Source: StackOverflow