Only 1 pod handles all requests in Kubernetes cluster

10/27/2019

Here is a manifest file for minikube Kubernetes, for a deployment and a service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-deployment
spec:
  selector:
    matchLabels:
      app: hello
  replicas: 3
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello
        image: hello_hello
        imagePullPolicy: Never
        ports:
        - containerPort: 4001
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: hello
spec:
  selector:
    app: hello
  ports:
  - port: 4001
    nodePort: 30036
    protocol: TCP
  type: NodePort

And a simple HTTP-server written in Golang

package main
import (
    http "net/http"

    "github.com/gin-gonic/gin"
)

func main() {
    r := gin.Default()
    r.GET("/ping", func(c *gin.Context) {
        c.JSON(200, gin.H{
            "message": "pong",
        })
    })

    server := &http.Server{
        Addr:    ":4001",
        Handler: r,
    }

    server.ListenAndServe()
}

When I make several requests to IP:30036/ping and then open pod's logs, I can see that only 1 of 3 pods handles all requests. How to make other pods response on requests?

-- ligowsky
go
kubernetes
load-balancing

3 Answers

10/27/2019

You are exposing a service using a NodePort, so there is no reverse proxy in place, but you directly connect to your Pod(s). This is a good choice to start with. (Later you might want to use an Ingress)

What you are seeing is that only one Pod handles your requests. You expect that each request is load balanced to a different pod. And your assumption is correct, but the load balancing does not happen on HTTP request layer, but on the TCP layer.

So when you have a persistent TCP connection and re-use it, you will not experience the load balancing that you expect. Since establishing a TCP connection is rather expensive latency wise usually an optimization is in place to avoid repeatedly opening new TCP connections: HTTP keep-alive.

Keep alive is by default enabled in most frameworks and clients, this is true for Go as well. Try s.SetKeepAlivesEnabled(false) and see if that fixes your issue. (Recommended only for testing!)

You can also use multiple different clients, f.e. from the command line with curl or disable keep-alive in Postman.

-- Thomas
Source: StackOverflow

10/27/2019

In Kubernetes cluster, requests that are sent to k8s services are routed by kube-proxy.

The default kube-proxy mode is Iptalbles since Kubernetes v1.2 and it allows faster packet resolution between Services and backend Pods. The load balancing between backend Pods is done directly via the iptables rules.

Maybe you're not generating enough load which one pod cannot handle, that's why you're routed to the same pod from kube-proxy.

You can also see the answer to this question for implementing custom iptalbes-rule:

-- Kamol Hasan
Source: StackOverflow

5/2/2020

thanks to @Thomas for the great insight ! I tried playing with the request header and it solved the load balancing issue where all requests where only hitting a single replica, whereas for demos or testing, it useful to be able to distribute requests to all replicas

from the doc:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Connection Connection: keep-alive Connection: close

This request always hits the same pod

curl -H "Connection: keep-alive" http://your-service:port/path

However, using close, the request balanced to all pods

curl -H "Connection: close" http://your-service:port/path
-- salanfe
Source: StackOverflow