I am new to kubernetes and trying to understand when to use kubectl autoscale and kubectl scale commands
Scale in deployment tells how many pods should be always running to ensure proper working of the application. You have to specify it manually. In YAMLs you have to define it in spec.replicas
like in example below:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
Second way to specify scale (replicas) of deployment is use command.
$ kubectl run nginx --image=nginx --replicas=3
deployment.apps/nginx created
$ kubectl get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx 3 3 3 3 11s
It means that deployment will have 3 pods running and Kubernetes will always try to maintain this number of pods (If any of the pods will crush, K8s will recreate it). You can always change it with in spec.replicas
and use kubectl apply -f <name-of-deployment>
or via command
$ kubectl scale deployment nginx --replicas=10
deployment.extensions/nginx scaled
$ kubectl get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx 10 10 10 10 4m48s
Please read in documentation about scaling and replicasets.
Horizontal Pod Autoscaling (HPA) was invented to scale deployment based on metrics produced by pods. For example, if your application have about 300 HTTP request per minute and each your pod allows to 100 HTTP requests for minute it will be ok. However if you will receive a huge amount of HTTP request ~ 1000, 3 pods will not be enough and 70% of request will fail. When you will use HPA
, deployment will autoscale to run 10 pods to handle all requests. After some time, when number of request will drop to 500/minute it will scale down to 5 pods. Later depends on request number it might go up or down depends on your HPA configuration.
Easiest way to apply autoscale is:
$ kubectl autoscale deployment <your-deployment> --<metrics>=value --min=3 --max=10
It means that autoscale will automatically scale based on metrics to maximum 10 pods and later it will downscale minimum to 3. Very good example is shown at HPA documentation with CPU usage.
Please keep in mind that Kubernetes can use many types of metrics based on API (HTTP/HTTP request, CPU/Memory load, number of threads, etc.)
Hope it help you to understand difference between Scale and Autoscaling.