I have a Kubernetes cluster running on the Google Kubernetes Engine.
I have a deployment that I manually (by editing the hpa
object) scaled up from 100 replicas to 300 replicas to do some load testing. When I was load testing the deployment by sending HTTP requests to the service, it seemed that not all pods were getting an equal amount of traffic, only around 100 pods were showing that they were processing traffic (by looking at their CPU-load, and our custom metrics). So my suspicion was that the service is not load balancing the requests among all the pods equally.
If I checked the deployment
, I could see that all 300 replicas were ready.
$ k get deploy my-app --show-labels
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE LABELS
my-app 300 300 300 300 21d app=my-app
On the other hand, when I checked the service
, I saw this:
$ k describe svc my-app
Name: my-app
Namespace: production
Labels: app=my-app
Selector: app=my-app
Type: ClusterIP
IP: 10.40.9.201
Port: http 80/TCP
TargetPort: http/TCP
Endpoints: 10.36.0.5:80,10.36.1.5:80,10.36.100.5:80 + 114 more...
Port: https 443/TCP
TargetPort: https/TCP
Endpoints: 10.36.0.5:443,10.36.1.5:443,10.36.100.5:443 + 114 more...
Session Affinity: None
Events: <none>
What was strange to me is this part
Endpoints: 10.36.0.5:80,10.36.1.5:80,10.36.100.5:80 + 114 more...
I was expecting to see 300 endpoints there, is that assumption correct?
(I also found this post, which is about a similar issue, but there the author was experiencing only a few minutes delay until the endpoints were updated, but for me it didn't change even in half an hour.)
How could I troubleshoot what was going wrong? I read that this is done by the Endpoints controller, but I couldn't find any info about where to check its logs.
Update: We managed to reproduce this a couple more times. Sometimes it was less severe, for example 381 endpoints instead of 445. One interesting thing we noticed is that if we retrieved the details of the endpoints:
$ k describe endpoints my-app
Name: my-app
Namespace: production
Labels: app=my-app
Annotations: <none>
Subsets:
Addresses: 10.36.0.5,10.36.1.5,10.36.10.5,...
NotReadyAddresses: 10.36.199.5,10.36.209.5,10.36.239.2,...
Then a bunch of IPs were "stuck" in the NotReadyAddresses
state (not the ones that were "missing" from the service though, if I summed the number of IPs in Addresses
and NotReadyAddresses
, that was still less than the total number of ready pods). Although I don't know if this is related at all, I couldn't find much info online about this NotReadyAddresses
field.
I refer to your first try with 300 pods.
I would check the following:
kubectl get po -l app=my-app
to see if you get a 300 item list. Your service says you have 300 available pods, which makes your issue very interesting to analyze.
whether your pod/deployment manifest defined limit and request resources. This better helps scheduler.
whether some of your nodes have taints incompatible with your pod/deployment manifest
whether your pod/deploy manifest has liveness and readyness probes (please post them)
whether you defined some resourceQuota object, which limit the creation of pods/deployments
It turned out that this is caused by using preemptible VMs in our node pools, it doesn't happen if the nodes are not preemtibles.
We couldn't figure out more details of the root cause, but using preemtibles as the nodes is not an officially supported scenario anyway, so we switched to regular VMs.