I have a cluster in the Google Kubernetes Engine and want to make one of the deployments auto scalable by memory.
After doing a deployment, I check the horizontal scalation with the following command
kubectl describe hpa -n my-namespace
With this result:
Name: myapi-api-deployment
Namespace: my-namespace
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 15 Feb 2022 12:21:44 +0100
Reference: Deployment/myapi-api-deployment
Metrics: ( current / target )
resource memory on pods (as a percentage of request): <unknown> / 50%
Min replicas: 1
Max replicas: 5
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get memory utilization: missing request for memory
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 2m22s (x314 over 88m) horizontal-pod-autoscaler failed to get memory utilization: missing request for memory
When I use the kubectl top
command I can see the memory and cpu usage. Here is my deployment including the autoscale:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-api-deployment
namespace: my-namespace
annotations:
reloader.stakater.com/auto: "true"
spec:
replicas: 1
selector:
matchLabels:
app: my-api
version: v1
template:
metadata:
labels:
app: my-api
version: v1
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: "true"
spec:
serviceAccountName: my-api-sa
containers:
- name: esp
image: gcr.io/endpoints-release/endpoints-runtime:2
imagePullPolicy: Always
args: [
"--listener_port=9000",
"--backend=127.0.0.1:8080",
"--service=myproject.company.ai"
]
ports:
- containerPort: 9000
- name: my-api
image: gcr.io/myproject/my-api:24
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: "/healthcheck"
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: "/healthcheck"
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
resources:
limits:
cpu: 500m
memory: 2048Mi
requests:
cpu: 300m
memory: 1024Mi
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-api-deployment
namespace: my-namespace
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-api-deployment
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: memory
target:
type: "Utilization"
averageUtilization: 50
---
Using the autoscaling/v2beta2 recommended by the GKE documentation
When using the HPA with memory or CPU, you need to set resource requests for whichever metric(s) your HPA is using. See How does a HorizontalPodAutoscaler work, specifically
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod. If a target raw value is set, the raw metric values are used directly.
Your HPA is set to match the my-api-deployment
which has two containers. You have resource requests set for my-api
but not for esp
. So you just need to add a memory resource request to esp
.