K
Q

Question

Why does apply get stuck at module.eks.aws_autoscaling_group.workers[0]: Refreshing state?

5/17/2020

I'm trying to deploy EKS from official docs: https://learn.hashicorp.com/terraform/kubernetes/provision-aks-cluster

The deploy succeeded and I added helm/redis chart to it. Now, when I run terraform apply it gets stuck during updating state:

module.eks.aws_iam_instance_profile.workers[0]: Refreshing state... [id=cluster1234]
module.vpc.aws_route.private_nat_gateway[0]: Refreshing state... [id=r-rtb-1234]
module.eks.aws_security_group_rule.workers_ingress_cluster_https[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.workers_ingress_cluster[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.workers_egress_internet[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.cluster_https_worker_ingress[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.workers_ingress_self[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_launch_configuration.workers[0]: Refreshing state... [id=cluster-worker-group-1234]
module.eks.kubernetes_config_map.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]
module.eks.data.null_data_source.node_groups[0]: Refreshing state...
module.eks.random_pet.workers[0]: Refreshing state... [id=diverse-vervet]
module.eks.aws_autoscaling_group.workers[0]: Refreshing state... [id=cluster-worker-group-1234]

I already tried to leave if for few hours, for more hours and tried to delete everything and re-deploy, but seems like its a bug or smth?

Events logs during terraform apply:


gt; kubectl -n infra get events --sort-by='{.lastTimestamp}'
LAST SEEN   TYPE      REASON      OBJECT                       MESSAGE
58m         Normal    Pulled      pod/redis-master-0   Container image "docker.io/oliver006/redis_exporter:v1.0.3" already present on machine
28m         Warning   Unhealthy   pod/redis-slave-0    Readiness probe failed: 
Could not connect to Redis at redis-master-0.redis-headless.infra.svc.cluster.local:6379: Name or service not known
13m   Warning   Unhealthy   pod/redis-slave-0   Readiness probe failed: 
Could not connect to Redis at redis-master-0.redis-headless.infra.svc.cluster.local:6379: Name or service not known
3m31s     Warning   BackOff   pod/redis-slave-0      Back-off restarting failed container

After doing:

export TF_LOG=TRACE

and running terraform apply again I found this:

2020/05/18 01:10:43 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
2020/05/18 01:10:46 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/18 01:10:48 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
2020/05/18 01:10:51 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/18 01:10:53 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
2020/05/18 01:10:56 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/18 01:10:58 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"

I am having trouble figuring out what's wrong with prometheus now and how it all relates..

Why does apply get stuck at module.eks.aws_autoscaling_group.workers[0]: Refreshing state?

-- animekun

amazon-web-services

aws-auto-scaling

eks

kubernetes

terraform

1 Answer

5/18/2020

I'm still trying to deploy the cluster properly with tf, but so far issue above is gone, after running
terraform apply with export TF_LOG=TRACE
I found charts that got stuck and helm delete'd them, that fixed the issue, good luck with debugging!

-- animekun

Source: StackOverflow

KQ

Why does apply get stuck at module.eks.aws_autoscaling_group.workers[0]: Refreshing state?

Similar Questions

1 Answer

K
Q