Compute Engine unhealthy instance down 50% of the time

3/13/2019

I started to use google cloud 3 days ago or so, so I am completely new to it. I have 4 pods deployed to Google Kubernetes Engine:

  • Frontend: react app,
  • Redis,
  • Backend: made up of 2 containers, a nodejs server and a cloudsql-proxy,
  • Nginx-ingress-controller

** And also have an sql instance running for my postgresql database, hence the cloudsql-proxy container

This setup works well 50% of the time, but every now and then all the pods crash or/and the containers are recreated.

I tried to check all the relevant logs, but I really don't know which are actually relevant. But there is one thing that I found which correlates with my issue, I have 2 VM instances running, and one of them might be the faulty one:

enter image description here When I hover the loading spin, it says Instance is being verified, and it seems to be in this state 80% of the time, when it is not there is a yellow warning beside the name of the instance, saying The resource is not ready.

enter image description here Here is the cpu usage of the instance (the trend is the same for all the hardware), I checked in the logs of my frontend and backend containers, here is the last logs that correspond to a cpu drop:

2019-03-13 01:45:23.533 CET - Server ready

2019-03-13 01:45:33.477 CET - 2019/03/13 00:45:33 Client closed local connection on 127.0.0.1:5432

2019-03-13 01:54:07.270 CET - yarn run v1.10.1

enter image description here

As you can see here, all the pods are being recreated...


I think that it might come from the fact that the faulty instance is unhealthy:

Instance gke-*****-production-default-pool-0de6d459-qlxk is unhealthy for ...

...the health check is proceeding and recreating/restarting the instance again and again. Tell me if I am wrong. So, how can I discover what is making this instance unhealthy?

-- Jerlam
google-cloud-platform
google-compute-engine
google-kubernetes-engine
kubernetes

0 Answers