Kubernetes cluster shuts down after some processing

9/17/2019

I have a cluster on GCP running a NodeJS server. This server runs fine locally, but stops, without any message, when I send a post to a route. This post should send to some of my users a push message using FCM. My database is Cloud Firestore.

Pod logs:

Not sending to xxxxxxxxxxxxxxx
Not sending to xxxxxxxxxxxxxyx

app@1.0.0 prestart /opt/app
tsc


app@1.0.0 start /opt/app
node src/index.js

Dockerfile:

FROM node:11.15-alpine

# install deps
ADD package.json /tmp/package.json
RUN apk update && apk add yarn python g++ make && rm -rf /var/cache/apk/*
RUN cd /tmp && npm install

# Copy deps
RUN mkdir -p /opt/app && cp -a /tmp/node_modules /opt/app

# Setup workdir
WORKDIR /opt/app
COPY . /opt/app

# run
EXPOSE 3000
CMD ["npm", "start"] 

Kubernetes.yaml.tpl

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
  labels:
    app: app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
        - name: app
          env:
          - name: var1
            value: value1
          - name: var2
            value: value2
          - name: var3
            value: value3
          - name: var4
            value: value4
          - name: var5
            value: value5
          - name: var6
            value: value6
          image: gcr.io/${PROJECT_ID}/app:COMMIT_SHA
          ports:
            - containerPort: 3000
          livenessProbe:
            httpGet:
              path: /alive
              port: 3000
            initialDelaySeconds: 30
          readinessProbe:
            httpGet:
              path: /alive
              port: 3000
            initialDelaySeconds: 30
            timeoutSeconds: 1

---
apiVersion: networking.gke.io/v1beta1
kind: ManagedCertificate
metadata:
  name: app
spec:
  domains:
    - myDomain.com.br
---
apiVersion: v1
kind: Service
metadata:
  name: app
spec:
  type: NodePort
  selector:
    app: app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: app
  annotations:
    kubernetes.io/ingress.global-static-ip-name: "00.000.000.000"
    networking.gke.io/managed-certificates: app
spec:
  backend:
    serviceName: app
    servicePort: 80

My function that is being called:

var query = tokens;
const getTokens = (
    doc: FirebaseFirestore.QueryDocumentSnapshot
) => {
    // Get user token and send push    
}

const canSend = (user: User): boolean => {
 // Apply business logic to check if the user will receive a push
}


let allUsers: FirebaseFirestore.QuerySnapshot = userdata;
let allGroups: FirebaseFirestore.QuerySnapshot = groups;
await this.asyncForEach(
   query.docs,
   async (doc: FirebaseFirestore.QueryDocumentSnapshot) => {
       let userDoc: User;
       allUsers.docs.filter(
           (userDoc) => userDoc.data()['userId'] === doc.data()['id']
       ).forEach((user: any) => {
           userDoc = new User(user);
       });
       if (userDoc) {
           if (canSend(userDoc)) {
               console.log(`Sending to: ${userDoc.id}`);
               await getTokens(doc);
           } else {
               console.log(`Not sending to: ${doc.data()['id']} `);
           }
        } else {
            console.log(`${doc.data()['id']} Has no document`);
        }
    }
);
console.log('Finished');

EDIT1

I just noticed that this happens when my server send a heavy request or a lot of small requests

EDIT 2

kubectl get events return No resources found.

-- Lucas Szavara
firebase-cloud-messaging
google-cloud-platform
kubernetes
node.js
typescript

1 Answer

10/7/2019

As confirmed by OP, the problem was with livenessProbe failed due timeout, which led to pod termination.

I'd also recommend not to completely remove probe, but to increase timeout value (number of seconds after which the probe times out. Defaults to 1 second), let's say, up to 3-5 seconds within your deployment yaml

timeoutSeconds: 5

More info on configuring probes

-- A_Suh
Source: StackOverflow