I have a deployment with a web service which I am testing using K6. Horizontal pod autoscaling is enabled with min 1 and max 4 replicas. When I start the load test, the autoscaler scales up to the 4 replicas, but only the pod, which has already been available before autoscaling receives traffic. However, I can observe that all 4 pods are up and running.
The ClusterIP service serving the deployment has sessonAffinity set to None. The cpu request is 0.5 and the cpu limit is 1.3. The target average utilization of the hpa is 50. In the tests, CPU of the one pod serving traffic goes up to or slightly above the limit, the other pods show no cpu utilization.
However, when multiple pdos are available at test start, the load is distributed between these pods (even though not evenly). As shown in the figure 1, two tests were run in a row. The number of requests served by each pod in 5s aggregates is illustrated. In the first test, autoscaler scaled up to 4 pods, but only one pod received requests. In the second test, all 4 Pods were still there from the upscaling in test 1 and all received requests. figure 1
I also observed that when increasing traffic in the load test, some of the new pods receive some requests, as shown in traffic 2, where two load peaks have been added to the test. figure 2
Still, I want the requests to be distributed between all pods evenly, as soon as the pods are created due to autoscaling. What could be wrong?
Here the configuration:
deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: td-app
namespace: crypto
spec:
replicas: 1
selector:
matchLabels:
app: td-app
template:
metadata:
labels:
app: td-app
spec:
containers:
- env:
- name: HTPASSWDFILE
- name: KEYCATALOGUE
value: /home/app/keys.json
image: eu.gcr.io/td-cluster/td:2.3.0-2020457-a92b94
imagePullPolicy: IfNotPresent
name: td-app
ports:
- containerPort: 8080
name: main
protocol: TCP
resources:
limits:
cpu: 1.3
memory: 200Mi
requests:
cpu: 0.5
memory: 50Mi
securityContext:
runAsNonRoot: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/app/masterkeys
name: masterkeys
volumes:
- name: masterkeys
secret:
defaultMode: 420
secretName: masterkeys
service:
apiVersion: v1
kind: Service
metadata:
labels:
app: td-app
name: td-app-service
namespace: crypto
spec:
ports:
- port: 8080
protocol: TCP
targetPort: 8080
selector:
app: td-app
sessionAffinity: None
type: ClusterIP
hpa:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: td-app
namespace: crypto
labels:
app: td-app
spec:
maxReplicas: 4
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: td-app
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 50
For the tests K6.io is used. For 6 minutes, up to 12 virtual users send http POST requests to the service iteratively. The test runs as a job in the same cluster using the service name to send requests.