I started to use google cloud 3 days ago or so, so I am completely new to it. I have 4 pods deployed to Google Kubernetes Engine:
** And also have an sql instance running for my postgresql database, hence the cloudsql-proxy container
This setup works well 50% of the time, but every now and then all the pods crash or/and the containers are recreated.
I tried to check all the relevant logs, but I really don't know which are actually relevant. But there is one thing that I found which correlates with my issue, I have 2 VM instances running, and one of them might be the faulty one:
When I hover the loading spin, it says Instance is being verified, and it seems to be in this state 80% of the time, when it is not there is a yellow warning beside the name of the instance, saying The resource is not ready.
Here is the cpu usage of the instance (the trend is the same for all the hardware), I checked in the logs of my frontend and backend containers, here is the last logs that correspond to a cpu drop:
2019-03-13 01:45:23.533 CET - Server ready
2019-03-13 01:45:33.477 CET - 2019/03/13 00:45:33 Client closed local connection on 127.0.0.1:5432
2019-03-13 01:54:07.270 CET - yarn run v1.10.1
As you can see here, all the pods are being recreated...
I think that it might come from the fact that the faulty instance is unhealthy:
Instance gke-*****-production-default-pool-0de6d459-qlxk is unhealthy for ...
...the health check is proceeding and recreating/restarting the instance again and again. Tell me if I am wrong. So, how can I discover what is making this instance unhealthy?