Here is a manifest file for minikube Kubernetes, for a deployment and a service:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-deployment
spec:
selector:
matchLabels:
app: hello
replicas: 3
template:
metadata:
labels:
app: hello
spec:
containers:
- name: hello
image: hello_hello
imagePullPolicy: Never
ports:
- containerPort: 4001
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: hello
spec:
selector:
app: hello
ports:
- port: 4001
nodePort: 30036
protocol: TCP
type: NodePort
And a simple HTTP-server written in Golang
package main
import (
http "net/http"
"github.com/gin-gonic/gin"
)
func main() {
r := gin.Default()
r.GET("/ping", func(c *gin.Context) {
c.JSON(200, gin.H{
"message": "pong",
})
})
server := &http.Server{
Addr: ":4001",
Handler: r,
}
server.ListenAndServe()
}
When I make several requests to IP:30036/ping and then open pod's logs, I can see that only 1 of 3 pods handles all requests. How to make other pods response on requests?
You are exposing a service using a NodePort, so there is no reverse proxy in place, but you directly connect to your Pod(s). This is a good choice to start with. (Later you might want to use an Ingress)
What you are seeing is that only one Pod handles your requests. You expect that each request is load balanced to a different pod. And your assumption is correct, but the load balancing does not happen on HTTP request layer, but on the TCP layer.
So when you have a persistent TCP connection and re-use it, you will not experience the load balancing that you expect. Since establishing a TCP connection is rather expensive latency wise usually an optimization is in place to avoid repeatedly opening new TCP connections: HTTP keep-alive.
Keep alive is by default enabled in most frameworks and clients, this is true for Go as well. Try s.SetKeepAlivesEnabled(false)
and see if that fixes your issue. (Recommended only for testing!)
You can also use multiple different clients, f.e. from the command line with curl or disable keep-alive in Postman.
In Kubernetes cluster, requests that are sent to k8s services are routed by kube-proxy.
The default kube-proxy
mode is Iptalbles
since Kubernetes v1.2 and it allows faster packet resolution between Services and backend Pods. The load balancing between backend Pods is done directly via the iptables rules
.
Maybe you're not generating enough load which one pod cannot handle, that's why you're routed to the same pod from kube-proxy
.
You can also see the answer to this question for implementing custom iptalbes-rule
:
thanks to @Thomas for the great insight ! I tried playing with the request header and it solved the load balancing issue where all requests where only hitting a single replica, whereas for demos or testing, it useful to be able to distribute requests to all replicas
from the doc:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Connection Connection: keep-alive Connection: close
This request always hits the same pod
curl -H "Connection: keep-alive" http://your-service:port/path
However, using close
, the request balanced to all pods
curl -H "Connection: close" http://your-service:port/path