Kubernetes LoadBalancer service stopped responding after adding new nodes to cluster

10/30/2019

We are running a Kubernetes cluster in GKE (Google Kubernetes Engine version 1.13.10). It is a regional cluster that started with two nodes per zone (for a total of six nodes). We have several services running on this cluster including some web services and a Kerberos service.

Recently we changed the number of nodes per zone from two to three (so we now have nine nodes). When we did this the Kerberos service become inaccessible.

Some detail: the Kerberos service runs on three pods in a StatefulSet behind two Services (UDP and TCP) with a static IP address. The Service is a LoadBalancer and uses a local external traffic policy so we can more easily log the client's IP address.

When we added the extra nodes the Kerberos Service logged the following events:

  Type    Reason               Age                From                Message
  ----    ------               ----               ----                -------
  Normal  UpdatedLoadBalancer  53m (x2 over 56m)  service-controller  Updated load balancer with new hosts

The pods kept running but the Service's external endpoint was no longer acessible: telneting to the endpoint showed nothing at the other end. Restarting the pods solved the problem.

Here is the definition for the TCP Service:

kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: kdc.example.org
  name: kdc-tcp
  namespace: kdc
spec:
  clusterIP: 10.8.18.71
  externalTrafficPolicy: Local
  healthCheckNodePort: 32447
  loadBalancerIP: 35.101.23.134
  ports:
  - name: kerberos-tcp
    nodePort: 32056
    port: 88
    protocol: TCP
    targetPort: 88
  selector:
    app: kdc
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 35.101.23.134

Why would adding some extra nodes cause this to happen? How can we avoid this problem in the future?

-- rlandster
kubernetes
load-balancing

0 Answers