I am trying to add Horizontal Pod Autoscaling (HPA) to my Kubernetes Deployments. My application is composed by 5 microservices which are connected between them. I have one NodePort (a Traefik service) which is in charge of controlling the traffic from the outside. The call is a HTTPS POST that sends a csv file which is processed by the application. It can be send via command line or using a web app (UI). The curl command will be like:
curl https://our_app_name -X POST -F "file=@test.csv"
The questions I have about the functionality of HPA are the following,
How HPA distributes the calls between the replicas? Is Kubernetes capable of parallel computing, i.e., divide the call between the replicas or each call goes to only one replica?
Can a replica attend more than one call at the same time?
In the HPA specifications, for example here:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
What is this 50%? Is it 50% of the CPU of the pod or of the cluster?
Thanks in advance for your help!
averageUtilization: 50
HorizontalPodAutoscaler would attempt to ensure that each pod
was consuming roughly 50%
of its requested
CPU
. This is not CPU of the node.