How can I ignore failure of a container in multi-container pod?

8/21/2019

I have a multi-container application: app + sidecar. Both containers suppose to be alive all the time but sidecar is not really that important. Sidecar depends on external resource, if this resource is not available - sidecar crashes. And it takes entire pod down. Kubernetes tries to recreate pod and fails because sidecar now won't start. But from my business logic perspective - crash of sidecar is absolutely normal. Having that sidecar is nice but not mandatory. I don't want sidecar to take main app with it when it crashes. What would be best kubernets-native way to achieve that? Is it possible to tell kubernetes ignore failure of sidecar as a "false positive" event which is absolutely fine?

I can't find anything in pod specification what controls that behaviour.

My yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myapp
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: myapp
    spec:
      volumes:
      - name: logs-dir
        emptyDir: {}
      containers:
      - name: myapp
        image: ${IMAGE}
        ports:
        - containerPort: 9009
        volumeMounts:
        - name: logs-dir
          mountPath: /usr/src/app/logs
        resources:
          limits:
            cpu: "1"
            memory: "512Mi"
        readinessProbe:
          initialDelaySeconds: 60 
          failureThreshold: 8 
          timeoutSeconds: 1
          periodSeconds: 8 
          httpGet:
            scheme: HTTP
            path: /myapp/v1/admin-service/git-info
            port: 9009
      - name: graylog-sidecar
        image: digiapulssi/graylog-sidecar:latest
        volumeMounts:
        - name: logs-dir
          mountPath: /log
        env:
        - name: GS_TAGS
          value: "[\"myapp\"]"
        - name: GS_NODE_ID
          value: "nodeid"
        - name: GS_SERVER_URL
          value: "${GRAYLOG_URL}"
        - name: GS_LIST_LOG_FILES
          value: "[\"/ctwf\"]"
        - name: GS_UPDATE_INTERVAL
          value: "10"
        resources:
          limits:
            memory: "128Mi"
            cpu: "0.1"
-- Stqs
kubernetes
pod

2 Answers

8/21/2019

You can define a custom livenessProbe for your sidecar to have greater failureThreshold / periodSeconds to accommodate what is considered acceptable failure rate in your environment, or simply ignore all failure.

Docs:

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#probe-v1-core

kubectl explain deployment.spec.template.spec.containers.livenessProbe

-- Keilo
Source: StackOverflow

8/21/2019

A custom livenessProbe should help but for your scenario I would use the liveness for your main app container which is the myapp.
Considering the fact that you don't care about the sidecare (as mentioned). I would set the pod restartPolicy to Never and then define a custom livelinessProbe for your main myapp. In this way the Pod will never restart doesn't matter which container is failed but when your myapp container's liveliness fails kubelet will restart the container! Ref below, link

Pod is running and has two Containers. Container 1 exits with failure.

Log failure event. If restartPolicy is: Always: Restart Container; Pod phase stays Running. OnFailure: Restart Container; Pod phase stays Running. Never: Do not restart Container; Pod phase stays Running.

so the updated (pseudo) yaml should look like below

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myapp
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    ...
    spec:
      ...
      restartPolicy: Never
      containers:
      - name: myapp
        ...
        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - {{ your custom liveliness check command goes }}
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        readinessProbe:
          ...
      - name: graylog-sidecar
        ...

Note: since I don't know your application therefore I cannot write the command but for my jboss server I use this (an example for you)

livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - /opt/jboss/wildfly/bin/jboss-cli.sh --connect --commands="read-attribute
              server-state"
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
-- garlicFrancium
Source: StackOverflow