I am using Kubernetes v1.20.10 baremetal installation. It has one master node and 3 worker nodes. The application simply served HTTP requests.
I am scaling the deployment based on the (HPA) Horizontal Pod Autoscaler and I noticed that the load is not getting evenly across pods. Only the first pod is getting 95% of the load and the other Pod is getting very low load.
I tried the answer mentioned here but did not work : https://stackoverflow.com/questions/61586163/kubernetes-service-does-not-distribute-requests-between-pods
Based on the information provided I assume that you are using http-keepalive which is a persistent tcp connection. A kubernetes service distributes load for each (new) tcp connection. If you have persistent connections, only the additional connections will be distributed which is the effect that you observe.
Try: Disable http keepalive or set the maximum keepalive time to something like 15 seconds, maximum requests to 50.