My nodes got deleted in EKS, how can I recover

8/19/2019

I am doing getting started with AWS-EKS demo on my machine. I created a EKS cluster, Worker nodes and then attached those nodes to the Cluster and deployed nginx service over the nodes. In first attempt, I could do this demo successful, and I was able to access the Load balancer url, having nginx service deployed on it. Now while playing with the instance, both of my nodes say node1 and node2 got deleted with below commands

kubectl delete node <node-name>
node "ip-***-***-***-**.ap-south-1.compute.internal" deleted

To recover this i spent more time, i found that the Load balancer URL is ACTIVE, the two respective EC2 instances (or worker nodes) are running fine. However, below command gives this result

PS C:\k8s> kubectl get nodes
No resources found.
PS C:\k8s>

I tried to replicate step#3 from getting started guide But could end up only in recreating the same worker nodes

When i try to create a pods again on the same EC2 instances or worker node, it says STATUS is pending for pods

PS C:\k8s> kubectl create -f .\aws-pod-nginx.yaml
deployment.apps/nginx created
PS C:\k8s> kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-76b782ee75-n6nwv   0/1     Pending   0          38s
nginx-76b78dee75-rcf6d   0/1     Pending   0          38s
PS C:\k8s> kubectl get pods

when i describe the pod error is as below:

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  52s (x5 over 4m11s)  default-scheduler  no nodes available to schedule pods

I have my two EC2 instances (or worker nodes) running, I tried to attach those to ELB url manually, but the service status is 'OutOfService' for those EC2 instances

I would like to get result of the below command, having working nodes, which can be accessed from the ELB, but the result of below command 'no resources found':

kubectl get nodes
-- Jagdish0886
amazon-eks
amazon-elb
amazon-web-services
eks
kubernetes

1 Answer

9/13/2019

You say you deleted the nodes with the kubectl delete node <node-name> command. I don't think you wanted to do that. You deleted the nodes from Kubernetes, but the two EC2 instances are still running. Kubernetes is not able to schedule pods to run on the EC2 instances that were deleted from the cluster. It is very difficult to re-attach instances to the cluster. You would need to have ssh or SSM session manager access to log into the instances and run the commands to join the cluster.

It would actually be far easier to just delete the old EC2 instances and create new ones. If you followed the AWS EKS documentation to create the cluster, then an ASG (Auto Scaling Group, or Node Group) was created, and that ASG created the EC2 instances. The ASG allows you to scale up and down the number of EC2 instances in the cluster. Check to see if the EC2 instances were created by an ASG by using the AWS Console. Using the EC2 Instances page, select one of the instances that was in your cluster and then select the Tags tab to view the Tags attached to the instance. You will see a tag named aws:autoscaling:groupName if the instance was created by an ASG.

If the EC2 instance was created by an ASG, you can simply terminate the instance and the ASG will create a new one to replace it. When the new one comes up, its UserData will have a cloud-init script defined that will join the instance to the kubernetes cluster. Do this with all the nodes you removed with the kubectl delete node command.

When the new EC2 instances join the cluster you will see them with the kubectl get nodes command. At this point, kubernetes will be able to schedule pods to run on those instances.

-- dlaidlaw
Source: StackOverflow