What is the difference between kubectl autoscale vs kubectl scale

8/8/2019

I am new to kubernetes and trying to understand when to use kubectl autoscale and kubectl scale commands

-- linuxkaran
kubectl
kubernetes

1 Answer

8/8/2019

Scale in deployment tells how many pods should be always running to ensure proper working of the application. You have to specify it manually. In YAMLs you have to define it in spec.replicas like in example below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

Second way to specify scale (replicas) of deployment is use command.

$ kubectl run nginx --image=nginx --replicas=3
deployment.apps/nginx created

$ kubectl get deployment
NAME    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx   3         3         3            3           11s

It means that deployment will have 3 pods running and Kubernetes will always try to maintain this number of pods (If any of the pods will crush, K8s will recreate it). You can always change it with in spec.replicas and use kubectl apply -f <name-of-deployment> or via command

$ kubectl scale deployment nginx --replicas=10
deployment.extensions/nginx scaled

$ kubectl get deployment
NAME    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx   10        10        10           10          4m48s

Please read in documentation about scaling and replicasets.

Horizontal Pod Autoscaling (HPA) was invented to scale deployment based on metrics produced by pods. For example, if your application have about 300 HTTP request per minute and each your pod allows to 100 HTTP requests for minute it will be ok. However if you will receive a huge amount of HTTP request ~ 1000, 3 pods will not be enough and 70% of request will fail. When you will use HPA, deployment will autoscale to run 10 pods to handle all requests. After some time, when number of request will drop to 500/minute it will scale down to 5 pods. Later depends on request number it might go up or down depends on your HPA configuration.

Easiest way to apply autoscale is:

$ kubectl autoscale deployment <your-deployment> --<metrics>=value --min=3 --max=10

It means that autoscale will automatically scale based on metrics to maximum 10 pods and later it will downscale minimum to 3. Very good example is shown at HPA documentation with CPU usage.

Please keep in mind that Kubernetes can use many types of metrics based on API (HTTP/HTTP request, CPU/Memory load, number of threads, etc.)

Hope it help you to understand difference between Scale and Autoscaling.

-- PjoterS
Source: StackOverflow