Is there any way to drain CloudWatch Container Insight nodes with autoscaler on EKS?

4/13/2021

Cluster Specification:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: mixedCluster
  region: ap-southeast-1

nodeGroups:
  - name: scale-spot
    desiredCapacity: 1
    maxSize: 10
    instancesDistribution:
      instanceTypes: ["t2.small", "t3.small"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
    availabilityZones: ["ap-southeast-1a", "ap-southeast-1b"]
    iam:
      withAddonPolicies:
        autoScaler: true
    labels:
      nodegroup-type: stateless-workload
      instance-type: spot
    ssh:
      publicKeyName: newkeypairbro

availabilityZones: ["ap-southeast-1a", "ap-southeast-1b"]

Problem:

CloudWatch pods will automatically created for each nodes when I scale-up my apps (business pods). But when I decided to scale-down my business pods to zero, my cluster autoscaler is not draining or terminating the cloudWatch stuff (pods) inside some nodes. So, this will create a dummy nodes inside my cluster. enter image description here

Based on the image above, the last node is dummy nodes with cloudWatch pods inside it: enter image description here

Expected result:

How to gracefully drain (automatically) Amazon CloudWatch nodes after business pod termination? So it won't create a dummy nodes?


This is my autoscaler config:

Name:                   cluster-autoscaler
Namespace:              kube-system
CreationTimestamp:      Sun, 11 Apr 2021 20:44:28 +0700
Labels:                 app=cluster-autoscaler
Annotations:            cluster-autoscaler.kubernetes.io/safe-to-evict: false
                        deployment.kubernetes.io/revision: 2
Selector:               app=cluster-autoscaler
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=cluster-autoscaler
  Annotations:      prometheus.io/port: 8085
                    prometheus.io/scrape: true
  Service Account:  cluster-autoscaler
  Containers:
   cluster-autoscaler:
    Image:      k8s.gcr.io/autoscaling/cluster-autoscaler:v1.18.3
    Port:       <none>
    Host Port:  <none>
    Command:
      ./cluster-autoscaler
      --v=4
      --stderrthreshold=info
      --cloud-provider=aws
      --skip-nodes-with-local-storage=false
      --expander=least-waste
      --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/mixedCluster
    Limits:
      cpu:     100m
      memory:  300Mi
    Requests:
      cpu:        100m
      memory:     300Mi
    Environment:  <none>
    Mounts:
      /etc/ssl/certs/ca-certificates.crt from ssl-certs (ro)
  Volumes:
   ssl-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs/ca-bundle.crt
    HostPathType:
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   cluster-autoscaler-54ccd944f6 (1/1 replicas created)
Events:          <none>

My Attempts:

I have tried to scale it down manually with this command:

eksctl scale nodegroup --cluster=mixedCluster --nodes=1 --name=scale-spot

It doesn't work, and returns:

[ℹ]  scaling nodegroup stack "eksctl-mixedCluster-nodegroup-scale-spot" in cluster eksctl-mixedCluster-cluster
[ℹ]  no change for nodegroup "scale-spot" in cluster "eksctl-mixedCluster-cluster": nodes-min 1, desired 1, nodes-max 10
-- Alfian Firmansyah
amazon-eks
aws-cloudwatch-log-insights
eksctl
kubernetes
kubernetes-pod

1 Answer

4/17/2021

Nevermind, I have solved my own question. Since my cluster is using t2.small and t3.small instances, the resources are too low to trigger autoscaler to scale down the dummy nodes. I have tried with bigger instance specifications, t3a.medium, and t3.medium and it worked well.

-- Alfian Firmansyah
Source: StackOverflow