I am working on a Terraform project that has an end goal of an EKS cluster with the following properties:
To accomplish this, I've modified the Terraform EKS example slightly (code at bottom of the question). The problems that I am encountering is that after SSH-ing into the bastion, I cannot ping the cluster and any commands like kubectl get pods
timeout after about 60 seconds.
Here are the facts/things I know to be true:
1. I have (for the time being) switched the cluster to a public cluster for testing purposes. Previously when I had cluster_endpoint_public_access
set to false
the terraform apply
command would not even complete as it could not access the /healthz
endpoint on the cluster.
2. The Bastion configuration works in the sense that the user data runs successfully and installs kubectl
and the kubeconfig file
3. I am able to SSH into the bastion via my static IP (that's the var.company_vpn_ips
in the code)
4. It's entirely possible this is fully a networking problem and not an EKS/Terraform problem as my understanding of how the VPC and its security groups fit into this picture is not entirely mature.
Here is the VPC configuration:
locals {
vpc_name = "my-vpc"
vpc_cidr = "10.0.0.0/16"
public_subnet_cidr = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
private_subnet_cidr = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}
# The definition of the VPC to create
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.2.0"
name = local.vpc_name
cidr = local.vpc_cidr
azs = data.aws_availability_zones.available.names
private_subnets = local.private_subnet_cidr
public_subnets = local.public_subnet_cidr
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
public_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
}
data "aws_availability_zones" "available" {}
Then the security groups I create for the cluster:
resource "aws_security_group" "ssh_sg" {
name_prefix = "ssh-sg"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
]
}
}
resource "aws_security_group" "all_worker_mgmt" {
name_prefix = "all_worker_management"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
]
}
}
Here's the cluster configuration:
locals {
cluster_version = "1.21"
}
# Create the EKS resource that will setup the EKS cluster
module "eks_cluster" {
source = "terraform-aws-modules/eks/aws"
# The name of the cluster to create
cluster_name = var.cluster_name
# Disable public access to the cluster API endpoint
cluster_endpoint_public_access = true
# Enable private access to the cluster API endpoint
cluster_endpoint_private_access = true
# The version of the cluster to create
cluster_version = local.cluster_version
# The VPC ID to create the cluster in
vpc_id = var.vpc_id
# The subnets to add the cluster to
subnets = var.private_subnets
# Default information on the workers
workers_group_defaults = {
root_volume_type = "gp2"
}
worker_additional_security_group_ids = [var.all_worker_mgmt_id]
# Specify the worker groups
worker_groups = [
{
# The name of this worker group
name = "default-workers"
# The instance type for this worker group
instance_type = var.eks_worker_instance_type
# The number of instances to raise up
asg_desired_capacity = var.eks_num_workers
asg_max_size = var.eks_num_workers
asg_min_size = var.eks_num_workers
# The security group IDs for these instances
additional_security_group_ids = [var.ssh_sg_id]
}
]
}
data "aws_eks_cluster" "cluster" {
name = module.eks_cluster.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks_cluster.cluster_id
}
output "worker_iam_role_name" {
value = module.eks_cluster.worker_iam_role_name
}
And the finally the bastion:
locals {
ami = "ami-0f19d220602031aed" # Amazon Linux 2 AMI (us-east-2)
instance_type = "t3.small"
key_name = "bastion-kp"
}
resource "aws_iam_instance_profile" "bastion" {
name = "bastion"
role = var.role_name
}
resource "aws_instance" "bastion" {
ami = local.ami
instance_type = local.instance_type
key_name = local.key_name
associate_public_ip_address = true
subnet_id = var.public_subnet
iam_instance_profile = aws_iam_instance_profile.bastion.name
security_groups = [aws_security_group.bastion-sg.id]
tags = {
Name = "K8s Bastion"
}
lifecycle {
ignore_changes = all
}
user_data = <<EOF
#! /bin/bash
# Install Kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client
# Install Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
helm version
# Install AWS
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install
aws --version
# Install aws-iam-authenticator
curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/aws-iam-authenticator
chmod +x ./aws-iam-authenticator
mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$PATH:$HOME/bin
echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
aws-iam-authenticator help
# Add the kube config file
mkdir ~/.kube
echo "${var.kubectl_config}" >> ~/.kube/config
EOF
}
resource "aws_security_group" "bastion-sg" {
name = "bastion-sg"
vpc_id = var.vpc_id
}
resource "aws_security_group_rule" "sg-rule-ssh" {
security_group_id = aws_security_group.bastion-sg.id
from_port = 22
protocol = "tcp"
to_port = 22
type = "ingress"
cidr_blocks = var.company_vpn_ips
depends_on = [aws_security_group.bastion-sg]
}
resource "aws_security_group_rule" "sg-rule-egress" {
security_group_id = aws_security_group.bastion-sg.id
type = "egress"
from_port = 0
protocol = "all"
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
depends_on = [aws_security_group.bastion-sg]
}
The most pressing issue for me is finding a way to interact with the cluster via the bastion so that the other part of the Terraform code can run (the resources to spin up in the cluster itself). I am also hoping to understand how to setup a private cluster when it ends up being inaccessible to the terraform apply
command. Thank you in advance for any help you can provide!
See how your node group is communicate with the control plane, you need to add the same cluster security group to your bastion host in order for it to communicate with the control plane. You can find the SG id on the EKS console - Networking tab.