How Kubernetes HPA works?

6/8/2020

I am trying to add Horizontal Pod Autoscaling (HPA) to my Kubernetes Deployments. My application is composed by 5 microservices which are connected between them. I have one NodePort (a Traefik service) which is in charge of controlling the traffic from the outside. The call is a HTTPS POST that sends a csv file which is processed by the application. It can be send via command line or using a web app (UI). The curl command will be like:

curl https://our_app_name -X POST -F "file=@test.csv"

The questions I have about the functionality of HPA are the following,

  1. How HPA distributes the calls between the replicas? Is Kubernetes capable of parallel computing, i.e., divide the call between the replicas or each call goes to only one replica?

  2. Can a replica attend more than one call at the same time?

  3. In the HPA specifications, for example here:

  - type: Resource
    resource:
      name: cpu
      target:
       type: Utilization
        averageUtilization: 50

What is this 50%? Is it 50% of the CPU of the pod or of the cluster?

Thanks in advance for your help!

-- pquinta
hpa
kubernetes
yaml

1 Answer

6/8/2020
  1. Each call goes to only one replica of a pod
  2. You can send as many requests to a pod via ingress/service but it will finally reach to one of the replicas of the pod
  3. with averageUtilization: 50 HorizontalPodAutoscaler would attempt to ensure that each pod was consuming roughly 50% of its requested CPU. This is not CPU of the node.
-- Arghya Sadhu
Source: StackOverflow