Trying to understand what values to use for resources and limits of multiple container deployment

5/21/2020

I am trying to set up HorizontalPodAutoscaler autoscaler for my app, alongside automatic Cluster Autoscaling of DigitalOcean

I will add my deployment yaml below, I have also deployed metrics-server as per guide in link above. At the moment I am struggling to figure out how to determine what values to use for my cpu and memory requests and limits fields. Mainly due to variable replica count, i.e. do I need to account for maximum number of replicas each using their resources or for deployment in general, do I plan it per pod basis or for each container individually?

For some context I am running this on a cluster that can have up to two nodes, each node has 1 vCPU and 2GB of memory (so total can be 2 vCPUs and 4 GB of memory).

As it is now my cluster is running one node and my kubectl top statistics for pods and nodes look as follows:

kubectl top pods

NAME                       CPU(cores)   MEMORY(bytes)   
graphql-85cc89c874-cml6j   5m           203Mi           
graphql-85cc89c874-swmzc   5m           176Mi 

kubectl top nodes

NAME                      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
skimitar-dev-pool-3cpbj   62m          6%     1151Mi          73%  

I have tried various combinations of cpu and resources, but when I deploy my file my deployment is either stuck in a Pending state, or keeps restarting multiple times until it gets terminated. My horizontal pod autoscaler also reports targets as <unknown>/80%, but I believe it is due to me removing resources from my deployment, as it was not working.

Considering deployment below, what should I look at / consider in order to determine best values for requests and limits of my resources?

Following yaml is cleaned up from things like env variables / services, it works as is, but results in above mentioned issues when resources fields are uncommented.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: graphql
spec:
  replicas: 2
  selector:
    matchLabels:
      app: graphql
  template:
    metadata:
      labels:
        app: graphql
    spec:
      containers:
        - name: graphql-hasura
          image: hasura/graphql-engine:v1.2.1
          ports:
            - containerPort: 8080
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
          # resources:
          #   requests:
          #     memory: "150Mi"
          #     cpu: "100m"
          #   limits:
          #     memory: "200Mi"
          #     cpu: "150m"
        - name: graphql-actions
          image: my/nodejs-app:1
          ports:
            - containerPort: 4040
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /healthz
              port: 4040
          readinessProbe:
            httpGet:
              path: /healthz
              port: 4040
          # resources:
          #   requests:
          #     memory: "150Mi"
          #     cpu: "100m"
          #   limits:
          #     memory: "200Mi"
          #     cpu: "150m"

# Disruption budget
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: graphql-disruption-budget
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: graphql

# Horizontal auto scaling
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: graphql-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: graphql
  minReplicas: 2
  maxReplicas: 3
  metrics:
    - type: Resource
      resource:
        name: cpu
        targetAverageUtilization: 80
-- Ilja
autoscaling
cpu
digital-ocean
kubernetes
memory

1 Answer

5/21/2020

How to determine what values to use for my cpu and memory requests and limits fields. Mainly due to variable replica count, i.e. do I need to account for maximum number of replicas each using their resources or for deployment in general, do I plan it per pod basis or for each container individually

Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory.

  • Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.
  • Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.

The number of replicas will be determined by the autoscaler on the ReplicaController.

when I deploy my file my deployment is either stuck in a Pending state, or keeps restarting multiple times until it gets terminated.

  • pending state means that there is not resources available to schedule new pods.

  • restarting may be triggered by other issues, I'd suggest you to debug it after solving the scaling issues.

My horizontal pod autoscaler also reports targets as <unknown>/80%, but I believe it is due to me removing resources from my deployment, as it was not working.

  • You are correct, if you don't set the request limit, the % desired will remain unknown and the autoscaler won't be able to trigger scaling up or down.

  • Here you can see algorithm responsible for that.

  • Horizontal Pod Autoscaler will trigger new pods based on the request % of usage on the pod. In this case whenever the pod reachs 80% of the max request value it will trigger new pods up to the maximum specified.

For a good HPA example, check this link: Horizontal Pod Autoscale Walkthrough


But How does Horizontal Pod Autoscaler works with Cluster Autoscaler?

  • Horizontal Pod Autoscaler changes the deployment's or replicaset's number of replicas based on the current CPU load. If the load increases, HPA will create new replicas, for which there may or may not be enough space in the cluster.

  • If there are not enough resources, CA will try to bring up some nodes, so that the HPA-created pods have a place to run. If the load decreases, HPA will stop some of the replicas. As a result, some nodes may become underutilized or completely empty, and then CA will terminate such unneeded nodes.

NOTE: The key is to set the maximum replicas for HPA thinking on a cluster level according to the amount of nodes (and budget) available for your app, you can start setting a very high max ammount of replicas, monitor and then change it according to the usage metrics and prediction of future load.

If you have any question let me know in the comments.

-- willrof
Source: StackOverflow