gke removing instance groups from internal load balancers

2/15/2018

I have a sandbox gke cluster, with some services and some internal load balancers.

Services are mostly defined like this:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: my-app
  name: my-app
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
    cloud.google.com/load-balancer-type: "Internal"
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: my-app
  sessionAffinity: None
  type: LoadBalancer

But eventually someone reports that the endpoint is not working anymore (like twice a week), I go investigate and the load balancer has no instance groups attached anymore.

The only "weird" things we do are to scale all our app's pods down to 0 replicas when out of business hours and to use preemptible instances on the node pool... I thought it could be related to the first, but I forced scale down some services now and their load balancers still fine.

It may be related to preemptible though, seems like that if the pods are all in one instance (specially kube-system pods), when the node goes down the pods go down at all once, and it seems like it can recover properly from that.

Other weird thing I see happening is the k8s-ig--foobar coming to have 0 instances.

Has anyone experienced something like this? I couldn't find any docs about this.

-- caarlos0
google-kubernetes-engine
kubernetes

1 Answer

2/28/2018

I opened a bug and it was marked as "could not reproduce".

But, changing from preemptible to "normal" instance does "fix" the problem.

-- caarlos0
Source: StackOverflow