I'm scaling ML prediction based on CPU and memory utilization. I have used HPA for pod level scaling, where we specified the both CPU and memory metrics. While creating deployment, I have also specified the requests computing resources and limits for the same.(I have pasted both HPA configuration and pod template configuration is for reference)
I observed that although we specified the Resources limit and request, when I check the memory and CPU consumed by each pod, It's shows only one pod is consuming all the CPU and memory resources and other are consuming very less computing resources. As per my understanding all pods should consume approx equals resources so we can say it's scaled otherwise it's like running the code without k8s on single machine.
Note: I'm using python k8s client for creating the deployment and services not the yaml.
I have tried with tweaking with limits and resources parameter and observed that because of this is ML pipeline, memory and cpu consumption boost exponentially at some point.
My HPA configuration:-
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
namespace: default
name: #hpa_name
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: #deployment_name
minReplicas: 1
maxReplicas: 40
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- type: Resource
resource:
name: memory
targetAverageValue: 5Gi
My pod template code-
container = client.V1Container(
ports=[client.V1ContainerPort(container_port=8080)],
env=[client.V1EnvVar(name="ABC", value=12345)],
resources=client.V1ResourceRequirements(
limits={"cpu": "2", "memory": "22Gi"},
requests={"cpu": "1", "memory": "8Gi"},
),
Output of kubectl top pods
NAME CPU(cores) MEMORY(bytes)
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-77c6ds 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7d5n4l 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7dq6c9 14236m 16721Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7f6nmh 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7fzdc4 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7gvqtj 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7h6ld7 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7j7gv4 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7kxlvh 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7nnn8x 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7pmtnj 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7qflkh 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7s26cj 1m 176Mi
deploy-24b32e5dc388456f8f2263be39ffb5f7-de19236511504877-7st5lt 1m 176Mi
By above output , it is very clear that , third pod utilizing the most of the resources while other are at constant memory and CPU consumption which is very less.
My expectation is each pod should consume the resource approx equals based on limits and resources specified in requests of pod template. In this case it should be between 1 CPU and 2 CPU for CPU consumption and between 8 Gi to 22 Gi for memory/ less than requested resource but not beyond defined limits.
Thanks in advance for any points/help/hints.
As per RCA (Root cause analysis) of this issue, we verified by running ipvsadm -ln
while processing a workload in ours kubernetes cluster and found that only one TCP connection is made by payload and this causes all the workload to be concentrated in one pod even though other pods are available.
Ours application is based on GRPC and GRPC uses HTTP/2. HTTP/2 have feature to create single long lived TCP connection and request is multiplexed under the same TCP connection to minimize the TCP connection management overhead. Because of this, there was one long lived TCP connection attach to the single pod and since HPA spikes for memory and CPU, its scales the pods but load do not get distributed. Thus, we need some mechanism to distribute the load one step next to connection level load balancing(this is default load balancing in kubernetes) to request level load balancing.
Fortunate we found below solution, we followed this and it's worked for us.
https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/