Kubernetes requests not balanced

11/17/2020

We've just had an increase in traffic to our kubernetes cluster and I've noticed that of our 6 application pods, 2 of them are seemingly not used very much. kubectl top pods returns the following

CPU usage

You can see of the 6 pods, 4 of them are using more than 50% of the CPU (2 vCPU nodes), but two of them aren't really doing much at all.

Our cluster is setup on AWS, using the ALB ingress controller. The load balancer is configured to use the Least outstanding requests rather than Round robin in an attempt to balance things out a bit more, but we're still seeing this imbalance.

Is there any way of determining why this is happening, or if indeed it actually is a problem? I'm hoping it's normal behaviour rather than an issue but I'd rather investigate it.

App deployment config

This is the configuration of the main application pods. Nothing fancy really

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
    app.kubernetes.io/component: web
spec:
  replicas: {{ .Values.app.replicaCount }}
  selector:
    matchLabels:
      app: {{ include "app.fullname" . }}-web

  template:
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/config_maps/app-env-vars.yaml") . | sha256sum }}
      {{- with .Values.podAnnotations }}
        {{- toYaml . | nindent 10 }}
      {{- end }}
      labels:
        {{- include "app.selectorLabels" . | nindent 8 }}
        app.kubernetes.io/component: web
        app: {{ include "app.fullname" . }}-web
    spec:
      serviceAccountName: {{ .Values.serviceAccount.web }}
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - {{ include "app.fullname" . }}-web
              topologyKey: failure-domain.beta.kubernetes.io/zone
      containers:
        - name: {{ .Values.image.name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          command:
            - bundle
          args: ["exec", "puma", "-p", "{{ .Values.app.containerPort }}"]
          ports:
            - name: http
              containerPort: {{ .Values.app.containerPort }}
              protocol: TCP
          readinessProbe:
            httpGet:
              path: /healthcheck
              port: {{ .Values.app.containerPort }}
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 5
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          envFrom:
          - configMapRef:
              name: {{ include "app.fullname" . }}-cm-env-vars

          - secretRef:
              name: {{ include "app.fullname" . }}-secret-rails-master-key

          - secretRef:
              name: {{ include "app.fullname" . }}-secret-db-credentials

          - secretRef:
              name: {{ include "app.fullname" . }}-secret-db-url-app

          - secretRef:
              name: {{ include "app.fullname" . }}-secret-redis-credentials
-- PaReeOhNos
amazon-eks
kubernetes
kubernetes-ingress

1 Answer

11/18/2020

That's a known issue with Kubernetes.
In short, Kubernetes doesn't load balance long-lived TCP connections.
This excellent article covers it in details.

The load distribution you service is showing complies exactly with the case.

-- Olesya Bolobova
Source: StackOverflow