Kubernetes postStart lifecycle hook blocks CNI

3/22/2019

My workload needs network connectivity to start properly and I want to use a postStart lifecycle hook that waits until it is ready and then does something. However, lifecycle hooks seem to block CNI; the following workload will never be assigned an IP:

kubectl apply -f <(cat <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        lifecycle:
          postStart:
            exec:
              command:
              - "/bin/sh"
              - "-c"
              - |
                while true; do
                  sleep
                done
EOF
)
kubectl get pods -o wide

This means my workload never starts (hanging when trying to connect out) and my lifecycle hook loops forever. Is there a way to work around this?

EDIT: I used a sidecar instead of a lifecycle hook to achieve the same thing - still unsure why lifecycle hook doesn't work though, executing CNI is part of container creation IMO so I'd expect lifecycle hooks to fire after networking had been configured

-- dippynark
cni
kubernetes

1 Answer

3/22/2019

This is an interesting one :-) It's not much of an answer but I did some investigation and I thought I share it - perhaps it is of some use.

I started from the yaml posted in the question. Then I logged into the machine running this pod and located the container.

$ kubectl get pod -o wide
NAME                    READY   STATUS              RESTARTS   AGE   IP       NODE
nginx-8f59d655b-ds7x2   0/1     ContainerCreating   0          3m    <none>   node-x

$ ssh node-x
node-x$ docker ps | grep nginx-8f59d655b-ds7x2
2064320d1562        881bd08c0b08                                                                                                   "nginx -g 'daemon off"   3 minutes ago       Up 3 minutes                                              k8s_nginx_nginx-8f59d655b-ds7x2_default_14d1e071-4cd4-11e9-8104-42010af00004_0
2f09063ed20b        k8s.gcr.io/pause-amd64:3.1                                                                                     "/pause"                 3 minutes ago       Up 3 minutes                                              k8s_POD_nginx-8f59d655b-ds7x2_default_14d1e071-4cd4-11e9-8104-42010af00004_0

The second container running /pause is the infrastructure container. The other one is Pod's nginx container. Note that normally this information would be available trough kubectl get pod as well, but in this case it is not. Strange.

In the container I'd expect that the networking is set up and nginx is running. Let's verify that:

node-x$ docker exec -it 2064320d1562 bash
root@nginx-8f59d655b-ds7x2:/# apt update && apt install -y iproute2 procps
...installs correctly...
root@nginx-8f59d655b-ds7x2:/# ip a s eth0
3: eth0@if2136: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UP group default
    link/ether 0a:58:0a:f4:00:a9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.0.169/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::da:d3ff:feda:1cbe/64 scope link
       valid_lft forever preferred_lft forever

So networking is set up, routes are in place and the IP address on eth0 is actually on the overlay network as it is supposed to be. Looking at the process list now:

root@nginx-8f59d655b-ds7x2:/# ps auwx
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1  32652  4900 ?        Ss   18:56   0:00 nginx: master process nginx -g daemon off;
root         5  5.9  0.0   4276  1332 ?        Ss   18:56   0:46 /bin/sh -c while true; do   sleep done
nginx       94  0.0  0.0  33108  2520 ?        S    18:56   0:00 nginx: worker process
root     13154  0.0  0.0  36632  2824 ?        R+   19:09   0:00 ps auwx
root     24399  0.0  0.0  18176  3212 ?        Ss   19:02   0:00 bash

Hah, so nginx is running and so is the preStop command. Notice however to large PIDs. There is a typo in the deployment file and it is executing sleep with no parameters - which is an error.

root@nginx-8f59d655b-ds7x2:/# sleep
sleep: missing operand
Try 'sleep --help' for more information.

This is running from a loop, hence the loads of forking leading to large PIDs.

As another test, from a node I also try to curl the server:

node-x$ curl http://10.244.0.169
...
<p><em>Thank you for using nginx.</em></p>
...

Which is very much expected. So finally I'd like to force the preStop command to finish so from inside the container I kill the containing shell:

root@nginx-8f59d655b-ds7x2:/# kill -9 5
...container is terminated in a second, result of the preStop hook failure...

$ kubectl get pod
NAME                    READY     STATUS                                                                                                                          RESTARTS   AGE
nginx-8f59d655b-ds7x2   0/1       PostStartHookError: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (53423560 vs. 16777216)   0          21m

Hm, so I imagine the 50MB (!) worth of messages were the failures from the missing parameter to sleep. Actually, what is even more spooky is that the Deployment is not recovering from this failure. This Pod keeps hanging around forever, instead of what you'd expect (spawn another Pod and retry).

At this point I deleted the deployment and recreated it with the sleep fixed in the preStop hook (sleep 1). The results are much the same, and the Deployment won't spawn another Pod in that case either (so it was not that just that it choked on the logs).

Now I did say at the top that this is not really an answer. But perhaps some takeaway: the lifecycle hooks need some work before they can considered useful and safe.

-- Janos Lenart
Source: StackOverflow