NodeGroup is not joining EKS cluster when using CloudFormation

4/27/2020

I have been following this guide to create a Kubernetes cluster via CloudFormation, but the NodeGroup never joins the cluster, and I never get an error or explanation about why is not joining.

I can see the autoscaling group and the EC2 machines are created, but EKS reports that there is not node groups.

If I create a new node group manually through the web admin tool, it works, but it assigns different security groups. It has a launch template instead of a launch configuration.

Same AMI, same IAM role, same machine type...

I am very new in both CloudFormation and EKS, and I don't know how to proceed now to find out what the problem is.

Here is the template:

Description: >
    Kubernetes cluster

Parameters:

  EnvironmentName:
    Description: An environment name that will be prefixed to resource names
    Type: String

  KeyName:
    Description: The EC2 Key Pair to allow SSH access to the instances
    Type: AWS::EC2::KeyPair::KeyName

  VpcBlock:
    Type: String
    Default: 192.168.0.0/16
    Description: The CIDR range for the VPC. This should be a valid private (RFC 1918) CIDR range.

  Subnet01Block:
    Type: String
    Default: 192.168.64.0/18
    Description: CidrBlock for subnet 01 within the VPC

  Subnet02Block:
    Type: String
    Default: 192.168.128.0/18
    Description: CidrBlock for subnet 02 within the VPC

  Subnet03Block:
    Type: String
    Default: 192.168.192.0/18
    Description: CidrBlock for subnet 03 within the VPC. This is used only if the region has more than 2 AZs.

  NodeInstanceType:
    Description: EC2 instance type for the node instances
    Type: String

  NodeImageId:
    Type: AWS::EC2::Image::Id
    Description: AMI id for the node instances.

  NodeAutoScalingGroupMinSize:
    Type: Number
    Description: Minimum size of Node Group ASG.
    Default: 1

  NodeAutoScalingGroupMaxSize:
    Type: Number
    Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeAutoScalingGroupDesiredCapacity.
    Default: 3

  NodeAutoScalingGroupDesiredCapacity:
    Type: Number
    Description: Desired capacity of Node Group ASG.
    Default: 3

  BootstrapArguments:
    Description: Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami
    Default: ""
    Type: String

Resources:

  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock:  !Ref VpcBlock
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  InternetGateway:
    Type: "AWS::EC2::InternetGateway"
    Properties:
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  VPCGatewayAttachment:
    Type: "AWS::EC2::VPCGatewayAttachment"
    Properties:
      InternetGatewayId: !Ref InternetGateway
      VpcId: !Ref VPC

  RouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Route:
    DependsOn: VPCGatewayAttachment
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref RouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref InternetGateway

  Subnet01:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone: !Select [ 0, !GetAZs '' ]
      CidrBlock: !Ref Subnet01Block
      VpcId: !Ref VPC
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Subnet02:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 02
    Properties:
      AvailabilityZone: !Select [ 1, !GetAZs '' ]
      CidrBlock: !Ref Subnet02Block
      VpcId: !Ref VPC
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Subnet03:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 03
    Properties:
      AvailabilityZone: !Select [ 2, !GetAZs '' ]
      CidrBlock: !Ref Subnet03Block
      VpcId: !Ref VPC
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Subnet01RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref Subnet01
      RouteTableId: !Ref RouteTable

  Subnet02RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref Subnet02
      RouteTableId: !Ref RouteTable

  Subnet03RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref Subnet03
      RouteTableId: !Ref RouteTable

  ControlPlaneSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Cluster communication with worker nodes
      VpcId: !Ref VPC

  ClusterRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub ${EnvironmentName}KubernetesClusterRole
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: eks.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonEKSServicePolicy
        - arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Cluster:
    Type: AWS::EKS::Cluster
    Properties:
      Name: !Sub ${EnvironmentName}KubernetesCluster
      RoleArn: !GetAtt ClusterRole.Arn
      ResourcesVpcConfig:
        SecurityGroupIds:
          - !Ref ControlPlaneSecurityGroup
        SubnetIds:
          - !Ref Subnet01
          - !Ref Subnet02
          - !Ref Subnet03

  NodeRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub ${EnvironmentName}KubernetesNodeRole
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: ec2.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
        - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
        - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        - arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess
      Path: /
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  NodeInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: "/"
      Roles:
      - !Ref NodeRole

  NodeSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for all nodes in the cluster
      VpcId: !Ref VPC
      Tags:
      - Key: !Sub "kubernetes.io/cluster/${EnvironmentName}KubernetesCluster"
        Value: 'owned'
      - Key: Environment 
        Value: !Ref EnvironmentName

  NodeSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow node to communicate with each other
      GroupId: !Ref NodeSecurityGroup
      SourceSecurityGroupId: !Ref NodeSecurityGroup
      IpProtocol: '-1'
      FromPort: 0
      ToPort: 65535

  NodeSecurityGroupFromControlPlaneIngress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
      GroupId: !Ref NodeSecurityGroup
      SourceSecurityGroupId: !Ref ControlPlaneSecurityGroup
      IpProtocol: tcp
      FromPort: 1025
      ToPort: 65535

  ControlPlaneEgressToNodeSecurityGroup:
    Type: AWS::EC2::SecurityGroupEgress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow the cluster control plane to communicate with worker Kubelet and pods
      GroupId: !Ref ControlPlaneSecurityGroup
      DestinationSecurityGroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      FromPort: 1025
      ToPort: 65535

  NodeSecurityGroupFromControlPlaneOn443Ingress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
      GroupId: !Ref NodeSecurityGroup
      SourceSecurityGroupId: !Ref ControlPlaneSecurityGroup
      IpProtocol: tcp
      FromPort: 443
      ToPort: 443

  ControlPlaneEgressToNodeSecurityGroupOn443:
    Type: AWS::EC2::SecurityGroupEgress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
      GroupId: !Ref ControlPlaneSecurityGroup
      DestinationSecurityGroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      FromPort: 443
      ToPort: 443

  ClusterControlPlaneSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow pods to communicate with the cluster API Server
      GroupId: !Ref ControlPlaneSecurityGroup
      SourceSecurityGroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      ToPort: 443
      FromPort: 443

  NodeGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
      LaunchConfigurationName: !Ref NodeLaunchConfig
      MinSize: !Ref NodeAutoScalingGroupMinSize
      MaxSize: !Ref NodeAutoScalingGroupMaxSize
      VPCZoneIdentifier:
        - !Ref Subnet01
        - !Ref Subnet02
        - !Ref Subnet03
      Tags:
      - Key: Name
        Value: !Sub "${EnvironmentName}KubernetesCluster-Node"
        PropagateAtLaunch: 'true'
      - Key: !Sub 'kubernetes.io/cluster/${EnvironmentName}KubernetesCluster'
        Value: 'owned'
        PropagateAtLaunch: 'true'
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MaxBatchSize: '1'
        MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
        PauseTime: 'PT5M'

  NodeLaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      AssociatePublicIpAddress: 'true'
      IamInstanceProfile: !Ref NodeInstanceProfile
      ImageId: !Ref NodeImageId
      InstanceType: !Ref NodeInstanceType
      KeyName: !Ref KeyName
      SecurityGroups:
      - !Ref NodeSecurityGroup
      BlockDeviceMappings:
        - DeviceName: /dev/xvda
          Ebs:
            VolumeSize: 20
            VolumeType: gp2
            DeleteOnTermination: true
      UserData:
        Fn::Base64:
          !Sub |
            #!/bin/bash
            set -o xtrace
            /etc/eks/bootstrap.sh ${EnvironmentName}KubernetesCluster ${BootstrapArguments}
            /opt/aws/bin/cfn-signal --exit-code $? \
                     --stack  ${AWS::StackName} \
                     --resource NodeGroup  \
                     --region ${AWS::Region}

Outputs:

    KubernetesClusterName:
      Description: Cluster name
      Value: !Ref Cluster
      Export:
        Name: KubernetesClusterName

    KubernetesClusterEndpoint:
      Description: Cluster endpoint
      Value: !GetAtt Cluster.Endpoint
      Export:
        Name: KubernetesClusterEndpoint

    KubernetesNodeInstanceProfile:
      Description: The name of the IAM profile for K8
      Value: !GetAtt NodeInstanceProfile.Arn
      Export:
        Name: KubernetesNodeInstanceProfileArn
-- Vlad
amazon-cloudformation
amazon-web-services
aws-eks
aws-security-group
kubernetes

1 Answer

4/29/2020

There are two ways of adding Worker nodes to your EKS cluster:

  1. Launch and register workers on your own (https://docs.aws.amazon.com/eks/latest/userguide/launch-workers.html)
  2. Use managed node groups (https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html)

As I can see from your template, you are using the first approach by now. Important when doing this is, that you need to wait until the EKS Cluster is ready and in state active, before launching the worker nodes. You can achieve this by using the DependsOn Attribute. If this does not resolve your issues, have a look at the cloud init logs (/var/log/cloud-init-output.log) to check what is happening while joining the cluster.

If you would like to use Managed Node Groups, just remove the AutoScaling Group and LaunchConfiguration and use this type instead: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html The benefit is, that AWS takes care of creating the required resources (AutoScaling Group and LaunchTemplate) in your account for you and you can see the Node Group in the AWS Console.

-- Bastian Klein
Source: StackOverflow