I have been following this guide to create a Kubernetes
cluster via CloudFormation
, but the NodeGroup
never joins the cluster, and I never get an error or explanation about why is not joining.
I can see the autoscaling group and the EC2
machines are created, but EKS
reports that there is not node groups.
If I create a new node group manually through the web admin tool, it works, but it assigns different security groups
. It has a launch template
instead of a launch configuration
.
Same AMI
, same IAM
role, same machine type...
I am very new in both CloudFormation
and EKS
, and I don't know how to proceed now to find out what the problem is.
Here is the template:
Description: >
Kubernetes cluster
Parameters:
EnvironmentName:
Description: An environment name that will be prefixed to resource names
Type: String
KeyName:
Description: The EC2 Key Pair to allow SSH access to the instances
Type: AWS::EC2::KeyPair::KeyName
VpcBlock:
Type: String
Default: 192.168.0.0/16
Description: The CIDR range for the VPC. This should be a valid private (RFC 1918) CIDR range.
Subnet01Block:
Type: String
Default: 192.168.64.0/18
Description: CidrBlock for subnet 01 within the VPC
Subnet02Block:
Type: String
Default: 192.168.128.0/18
Description: CidrBlock for subnet 02 within the VPC
Subnet03Block:
Type: String
Default: 192.168.192.0/18
Description: CidrBlock for subnet 03 within the VPC. This is used only if the region has more than 2 AZs.
NodeInstanceType:
Description: EC2 instance type for the node instances
Type: String
NodeImageId:
Type: AWS::EC2::Image::Id
Description: AMI id for the node instances.
NodeAutoScalingGroupMinSize:
Type: Number
Description: Minimum size of Node Group ASG.
Default: 1
NodeAutoScalingGroupMaxSize:
Type: Number
Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeAutoScalingGroupDesiredCapacity.
Default: 3
NodeAutoScalingGroupDesiredCapacity:
Type: Number
Description: Desired capacity of Node Group ASG.
Default: 3
BootstrapArguments:
Description: Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami
Default: ""
Type: String
Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: !Ref VpcBlock
EnableDnsSupport: true
EnableDnsHostnames: true
Tags:
- Key: Environment
Value: !Ref EnvironmentName
InternetGateway:
Type: "AWS::EC2::InternetGateway"
Properties:
Tags:
- Key: Environment
Value: !Ref EnvironmentName
VPCGatewayAttachment:
Type: "AWS::EC2::VPCGatewayAttachment"
Properties:
InternetGatewayId: !Ref InternetGateway
VpcId: !Ref VPC
RouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Environment
Value: !Ref EnvironmentName
Route:
DependsOn: VPCGatewayAttachment
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref RouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref InternetGateway
Subnet01:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone: !Select [ 0, !GetAZs '' ]
CidrBlock: !Ref Subnet01Block
VpcId: !Ref VPC
MapPublicIpOnLaunch: true
Tags:
- Key: Environment
Value: !Ref EnvironmentName
Subnet02:
Type: AWS::EC2::Subnet
Metadata:
Comment: Subnet 02
Properties:
AvailabilityZone: !Select [ 1, !GetAZs '' ]
CidrBlock: !Ref Subnet02Block
VpcId: !Ref VPC
MapPublicIpOnLaunch: true
Tags:
- Key: Environment
Value: !Ref EnvironmentName
Subnet03:
Type: AWS::EC2::Subnet
Metadata:
Comment: Subnet 03
Properties:
AvailabilityZone: !Select [ 2, !GetAZs '' ]
CidrBlock: !Ref Subnet03Block
VpcId: !Ref VPC
MapPublicIpOnLaunch: true
Tags:
- Key: Environment
Value: !Ref EnvironmentName
Subnet01RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref Subnet01
RouteTableId: !Ref RouteTable
Subnet02RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref Subnet02
RouteTableId: !Ref RouteTable
Subnet03RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref Subnet03
RouteTableId: !Ref RouteTable
ControlPlaneSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Cluster communication with worker nodes
VpcId: !Ref VPC
ClusterRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub ${EnvironmentName}KubernetesClusterRole
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: eks.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSServicePolicy
- arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
Tags:
- Key: Environment
Value: !Ref EnvironmentName
Cluster:
Type: AWS::EKS::Cluster
Properties:
Name: !Sub ${EnvironmentName}KubernetesCluster
RoleArn: !GetAtt ClusterRole.Arn
ResourcesVpcConfig:
SecurityGroupIds:
- !Ref ControlPlaneSecurityGroup
SubnetIds:
- !Ref Subnet01
- !Ref Subnet02
- !Ref Subnet03
NodeRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub ${EnvironmentName}KubernetesNodeRole
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess
Path: /
Tags:
- Key: Environment
Value: !Ref EnvironmentName
NodeInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: "/"
Roles:
- !Ref NodeRole
NodeSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for all nodes in the cluster
VpcId: !Ref VPC
Tags:
- Key: !Sub "kubernetes.io/cluster/${EnvironmentName}KubernetesCluster"
Value: 'owned'
- Key: Environment
Value: !Ref EnvironmentName
NodeSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow node to communicate with each other
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: '-1'
FromPort: 0
ToPort: 65535
NodeSecurityGroupFromControlPlaneIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId: !Ref ControlPlaneSecurityGroup
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
ControlPlaneEgressToNodeSecurityGroup:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with worker Kubelet and pods
GroupId: !Ref ControlPlaneSecurityGroup
DestinationSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
NodeSecurityGroupFromControlPlaneOn443Ingress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId: !Ref ControlPlaneSecurityGroup
IpProtocol: tcp
FromPort: 443
ToPort: 443
ControlPlaneEgressToNodeSecurityGroupOn443:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
GroupId: !Ref ControlPlaneSecurityGroup
DestinationSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
FromPort: 443
ToPort: 443
ClusterControlPlaneSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods to communicate with the cluster API Server
GroupId: !Ref ControlPlaneSecurityGroup
SourceSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
ToPort: 443
FromPort: 443
NodeGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
LaunchConfigurationName: !Ref NodeLaunchConfig
MinSize: !Ref NodeAutoScalingGroupMinSize
MaxSize: !Ref NodeAutoScalingGroupMaxSize
VPCZoneIdentifier:
- !Ref Subnet01
- !Ref Subnet02
- !Ref Subnet03
Tags:
- Key: Name
Value: !Sub "${EnvironmentName}KubernetesCluster-Node"
PropagateAtLaunch: 'true'
- Key: !Sub 'kubernetes.io/cluster/${EnvironmentName}KubernetesCluster'
Value: 'owned'
PropagateAtLaunch: 'true'
UpdatePolicy:
AutoScalingRollingUpdate:
MaxBatchSize: '1'
MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
PauseTime: 'PT5M'
NodeLaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
AssociatePublicIpAddress: 'true'
IamInstanceProfile: !Ref NodeInstanceProfile
ImageId: !Ref NodeImageId
InstanceType: !Ref NodeInstanceType
KeyName: !Ref KeyName
SecurityGroups:
- !Ref NodeSecurityGroup
BlockDeviceMappings:
- DeviceName: /dev/xvda
Ebs:
VolumeSize: 20
VolumeType: gp2
DeleteOnTermination: true
UserData:
Fn::Base64:
!Sub |
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh ${EnvironmentName}KubernetesCluster ${BootstrapArguments}
/opt/aws/bin/cfn-signal --exit-code $? \
--stack ${AWS::StackName} \
--resource NodeGroup \
--region ${AWS::Region}
Outputs:
KubernetesClusterName:
Description: Cluster name
Value: !Ref Cluster
Export:
Name: KubernetesClusterName
KubernetesClusterEndpoint:
Description: Cluster endpoint
Value: !GetAtt Cluster.Endpoint
Export:
Name: KubernetesClusterEndpoint
KubernetesNodeInstanceProfile:
Description: The name of the IAM profile for K8
Value: !GetAtt NodeInstanceProfile.Arn
Export:
Name: KubernetesNodeInstanceProfileArn
There are two ways of adding Worker nodes to your EKS cluster:
As I can see from your template, you are using the first approach by now. Important when doing this is, that you need to wait until the EKS Cluster is ready and in state active, before launching the worker nodes. You can achieve this by using the DependsOn Attribute. If this does not resolve your issues, have a look at the cloud init logs (/var/log/cloud-init-output.log) to check what is happening while joining the cluster.
If you would like to use Managed Node Groups, just remove the AutoScaling Group and LaunchConfiguration and use this type instead: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html The benefit is, that AWS takes care of creating the required resources (AutoScaling Group and LaunchTemplate) in your account for you and you can see the Node Group in the AWS Console.