Why is a failed startupProbe not killing the Pod but allowing it to run?

2/5/2021

I created a startup probe and made it so it would always fail. It should result in the pod getting killed and restarting, but it doesn't. I see one event of the startup probe failing (and no events after that), but the pods show as 1/1 Running. And when I run my Helm test, it passes!

I guaranteed the failure by setting an invalid username and password for the startup probe check.

Using K8s version: 1.19.4

When I check events, I get:

4m44s       Normal    SuccessfulCreate    replicaset/mysqlpod-5957645967   Created pod: mysqlpod-5957645967-fj95t
4m44s       Normal    ScalingReplicaSet   deployment/mysqlpod              Scaled up replica set mysqlpod-5957645967 to 1
4m44s       Normal    Scheduled           pod/mysqlpod-5957645967-fj95t    Successfully assigned data-layer/mysqlpod-5957645967-fj95t to minikube
4m43s       Normal    Created             pod/mysqlpod-5957645967-fj95t    Created container mysql
4m43s       Normal    Pulled              pod/mysqlpod-5957645967-fj95t    Container image "mysql:5.6" already present on machine
4m43s       Normal    Started             pod/mysqlpod-5957645967-fj95t    Started container mysql
4m41s       Warning   Unhealthy           pod/mysqlpod-5957645967-fj95t    Startup probe failed: Warning: Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!

Checking the Pods, I see (using --watch):

NAME                            READY   STATUS    RESTARTS   AGE
mysql-db-app-5957645967-fj95t   0/1     Running   0          7m18s
mysql-db-app-5957645967-fj95t   1/1     Running   0          7m43s

Notice it has zero restarts.

My Deployment has:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "mysqlapp.name" . }}
  namespace: {{ quote .Values.metadata.namespace }}
spec:
  replicas: {{ .Values.deploymentSpecs.replicas}}
  selector:
    matchLabels:
      {{- include "mysqlapp.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "mysqlapp.selectorLabels" . | nindent 8 }}
    spec:
      containers:
        - image: "{{ .Values.image.name }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          name: {{ .Values.image.name }}
          env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom: 
                secretKeyRef:
                  name: db-credentials
                  key: db-password
          ports:
            - containerPort: {{ .Values.ports.containerPort }}
              name: {{ .Values.image.name }}
          startupProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - mysqladmin ping -u wrong -pwrong
            periodSeconds: {{ .Values.startupProbe.periodSeconds }}
            timeoutSeconds: {{ .Values.startupProbe.timeoutSeconds }}
            successThreshold: {{ .Values.startupProbe.successThreshold }}
            failureThreshold: {{ .Values.startupProbe.failureThreshold }}

Notice the - mysqladmin ping -u wrong -pwrong above.

Values.yaml:

metadata:
  namespace: data-layer
  myprop: value
deploymentSpecs:
  replicas: 1
labels:
  app: db-service
image:
  name: mysql
  pullPolicy: IfNotPresent
  tag: "5.6"
ports:
  containerPort: 3306
startupProbe:
  periodSeconds: 10
  timeoutSeconds: 2
  successThreshold: 1
  failureThreshold: 5

Even waiting for 5 minutes, I'm still able to run the test (which uses a MySql client to reach the DB) and it works! Why won't this fail?

-- Don Rhummy
kubernetes
kubernetes-helm
livenessprobe
mysql
yaml

1 Answer

2/5/2021

It wasn't failing because it turns out the ping command returns a 0 status even if the user/pass is wrong, as long as it can reach the server.

MySql ping command

Check whether the server is available. The return status from mysqladmin is 0 if the server is running, 1 if it is not. This is 0 even in case of an error such as Access denied, because this means that the server is running but refused the connection, which is different from the server not running.

To force a failure and restarts, you could use:

mysqladmin ping -u root -p${MYSQL_ROOT_PASSWORD} --host fake
-- Don Rhummy
Source: StackOverflow