Deleting EKS Cluster with eksctl not working properly, requires manual deletion of resources such as ManagedNodeGroups

8/6/2020

I'm running a cluster on EKS, and following the tutorial to deploy one using the command eksctl create cluster --name prod --version 1.17 --region eu-west-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --ssh-access --ssh-public-key public-key.pub --managed.

Once I'm done with my tests (mainly installing and then uninstalling helm charts), and i have a clean cluster with no jobs running, i then try to delete it with eksctl delete cluster --name prod, causing these errors.

[]  eksctl version 0.25.0
[]  using region eu-west-1
[]  deleting EKS cluster "test"
[]  deleted 0 Fargate profile(s)
[]  kubeconfig has been updated
[]  cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
[]  2 sequential tasks: { delete nodegroup "standard-workers", delete cluster control plane "test" [async] }
[]  will delete stack "eksctl-test-nodegroup-standard-workers"
[]  waiting for stack "eksctl-test-nodegroup-standard-workers" to get deleted
[]  unexpected status "DELETE_FAILED" while waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers"
[]  fetching stack events in attempt to troubleshoot the root cause of the failure
[]  AWS::CloudFormation::Stack/eksctl-test-nodegroup-standard-workers: DELETE_FAILED – "The following resource(s) failed to delete: [ManagedNodeGroup]. "
[]  AWS::EKS::Nodegroup/ManagedNodeGroup: DELETE_FAILED – "Nodegroup standard-workers failed to stabilize: [{Code: Ec2SecurityGroupDeletionFailure,Message: DependencyViolation - resource has a dependent object,ResourceIds: [[REDACTED]]}]"
[]  1 error(s) occurred while deleting cluster with nodegroup(s)
[]  waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers": ResourceNotReady: failed waiting for successful resource state

To fix them I had to manually delete AWS VPCs and then ManagednodeGroups, to then delete everything again.

I tried again with the steps above (creating and deleting with the commands provided in the official getting started documentation), but I get the same errors upon deleting.

It seems extremely weird that I have to manually delete resources when doing something like this. Is there a fix for this problem, am i doing something wrong, or is this standard procedure?

All commands are run through the official eksctl cli, and I'm following the official eksctl deployment

-- shaki
amazon-eks
amazon-web-services
eksctl
kubernetes

3 Answers

9/7/2020

If you are using Managed Node Groups and public subnets, be sure that you update your subnet settings to map public IPs on launch before April 20 April 22. You can follow the progress of the updates to managed node groups on our GitHub roadmap.

If you want to learn more about networking configurations and IP assignment for EKS clusters, check blog on cluster networking for worker nodes.

Also you can try:

  1. Go to EC2 > Network Interfaces
  2. Sort by VPC, find the interfaces assigned to your VPC
  3. The interface to delete should be the only one that is "available", it should also be the only one assigned to the problematic remote access SG. If more than one interface matches this description, delete them all.

Take a look: eks-managed-node-groups, eksctl-node-group.

-- Malgorzata
Source: StackOverflow

9/11/2020

If we try to delete the corresponding Security Group to which the Node Group EC2 is attached to, we will find the root cause.

Mostly it will say there is a Network Interface attached.

So the solution is to delete that linked Network Interface manually. Now the Node Group will be deleted without any error.

-- Karthikeyan S
Source: StackOverflow

1/18/2021

Have you tried running the eksctl delete cluster command with the --wait flag? Without that flag it will output a message that it is deleted but deletion activities are still going on in the background.

-- Matthew Thornington
Source: StackOverflow