Have a k8s daemonset which is simply supposed to set sysctl -w vm.max_map_count=262144
on host nodes where the pod is deployed. The daemonset works as expected when the reosurce is applied the first time but, if the k8s nodes the daemonsets are running on are later restarted, the daemonset pods do not update the host's vm.max_map_count
to 262144
. The ds
pods go into a running state but, on describe, they show:
State: Running
Started: Thu, 21 Jun 2018 12:01:51 +0100
Last State: Terminated
Reason: Error
Exit Code: 143
I can't figure out the reason for the error however and I don't know where to look in order to resolve the issue?
daemonset yaml:
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: ds-elk
labels:
app: elk
spec:
template:
metadata:
labels:
app: elk
spec:
hostPID: true
containers:
- name: startup-script
image: gcr.io/google-containers/startup-script:v1
imagePullPolicy: Always
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
sysctl -w vm.max_map_count=262144
echo done
Hosts are Red Hat EL 7.4
. Kubernetes Server version 1.8.6
.
kubectl describe pod ds-elk-5z5hs
output:
Name: ds-elk-5z5hs
Namespace: default
Node: xxx-00-xxxx-01v.devxxx.xxxxxx.xx.xx/xx.xxx.xx.xx
Start Time: Tue, 15 May 2018 14:03:14 +0100
Labels: app=elk
controller-revision-hash=2068481183
pod-template-generation=1
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"ds-elk","uid":"54372241-5840-11e8-aaaa-005056b97218","apiVersion"...
Status: Running
IP: xx.xxx.x.xxx
Controlled By: DaemonSet/ds-elk
Containers:
startup-script:
Container ID: docker://eff849b842ed7b28dcf07578301a12068c998cb42b59a88b2bf2e8243b72f419
Image: gcr.io/google-containers/startup-script:v1
Image ID: docker-pullable://gcr.io/google-containers/startup-script@sha256:be96df6845a2af0eb61b17817ed085ce41048e4044c541da7580570b61beff3e
Port: <none>
State: Running
Started: Thu, 21 Jun 2018 11:40:50 +0100
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Thu, 21 Jun 2018 07:24:56 +0100
Finished: Thu, 21 Jun 2018 11:39:22 +0100
Ready: True
Restart Count: 2
Environment:
STARTUP_SCRIPT: #! /bin/bash
sysctl -w vm.max_map_count=262144
echo done
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-ld98j (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-ld98j:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-ld98j
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute
node.alpha.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
Events: <none>
The script uses /tmp/startup-script.kubernetes.io
file as a mark that script has run once. This file is placed in your node /tmp
directory not container. So the next time DaemonSet
schedules the pod to this node the script just sleeps.
Here is the script to look at: https://github.com/kubernetes/contrib/blob/master/startup-script/manage-startup-script.sh
Please note that the image you are referring to is not build from exactly this version of code. In particular it doesn't use md5sum
suffix to allow script run after the change of the script's code.
Ended up getting rid of the daemonset
altogether and instead setting vm.max_map_count
in the pod's initContainers
spec:
initContainers:
- name: "sysctl"
image: "busybox"
imagePullPolicy: "Always"
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true