I'm using Terraform v0.15.0 to deploy a zonal GKE cluster with 3 separate node pools which are being created right after the cluster. The initial node pool is being created with 1 node and is deleted when the cluster is ready. Then I use Helm provider to deploy some workload: first I deploy Nginx Ingress Chart. These two steps work fine. Right after Nginx Ingress Chart, I deploy our application that consists of 16 Deployments/Stateful Sets and related Services PVC, PV... That step finishes after 10s and marks as finished and all looks good but there are no workloads in the cluster except for Nginx Ingress deployment. Instead, I see REPAIR_CLUSTER operation in running state:
% gcloud container operations list
NAME TYPE LOCATION TARGET STATUS_MESSAGE STATUS START_TIME END_TIME
operation-1621007628555-9d3ea30c REPAIR_CLUSTER europe-west1 <project_name_here> RUNNING 2021-05-14T15:53:48.555309083Z
Cluster status is RECONCILING:
% gcloud container clusters list
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE
<cluster_name_here> europe-west1-b 1.17.17-gke.4900 <ip_here> e2-standard-2 1.16.15-gke.7800 6 RECONCILING
I already tried:
When REPAIR_CLUSTER operation finished, the cluster is ready to handle my workload but it makes my pipeline failed. So my question is how to avoid that operation or how to warm up Google's resources in advance.