AWS Load Balancer Failed to Deploy

11/7/2020

I'm trying to create AWS ALB-Ingress through EKS following the steps in the document https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html

I was successful till the step 7 in creating the controller:

[ec2-user@ip-X-X-X-X eks-cluster]$ kubectl apply -f v2_0_0_full.yaml 
customresourcedefinition.apiextensions.k8s.io/targetgroupbindings.elbv2.k8s.aws created 
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook created 
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply 
serviceaccount/aws-load-balancer-controller configured 
role.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-role created 
clusterrole.rbac.authorization.k8s.io/aws-load-balancer-controller-role created 
rolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-rolebinding created 
clusterrolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-rolebinding created 
service/aws-load-balancer-webhook-service created 
deployment.apps/aws-load-balancer-controller created 
certificate.cert-manager.io/aws-load-balancer-serving-cert created 
issuer.cert-manager.io/aws-load-balancer-selfsigned-issuer created 
validatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook created

However, the controller does NOT get to "Ready" status:

[ec2-user@ip-X-X-X-X eks-cluster]$ kubectl get deployment -n kube-system aws-load-balancer-controller
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
aws-load-balancer-controller   0/1     1            0           29m

I'm also able to list the pod associated with the controller which also shows NOT READY:

[ec2-user@ip-X-X-X-X eks-cluster]$ kubectl get pods -n kube-system
NAME                                            READY   STATUS    RESTARTS   AGE
aws-load-balancer-controller-XXXXXXXXXX-p4l7f   0/1     Pending   0          30m

I also can't seem to get its logs in order to try and debug the issue:

[ec2-user@ip-X-X-X-X eks-cluster]$ kubectl -n kube-system logs aws-load-balancer-controller-XXXXXXXXXX-p4l7f
[ec2-user@ip-X-X-X-X eks-cluster]$

Furthermore, the /var/log directory also does not have any related logs.

Please help me understand why it is not coming to READY state. Also let me know how to enable logging to debug these kind of issues.

-- Vishwas M.R
amazon-eks
amazon-web-services
aws-application-load-balancer
eksctl
kubernetes

2 Answers

11/26/2020

I found the answer here. A faragate deployment requires the region and vpc-id.

helm upgrade -i aws-load-balancer-controller eks/aws-load-balancer-controller \
    --set clusterName=<cluster-name> \
    --set serviceAccount.create=false \
    --set region=<region-code> \
    --set vpcId=<vpc-xxxxxxxx>> \
    --set serviceAccount.name=aws-load-balancer-controller \
    -n kube-system
-- user1842409
Source: StackOverflow

3/12/2021

From the current LB conntroller manifest I found out that LB controller Pod specification doesn't have Readiness probe, only Liveness probe. That means that the Pod becomes Ready as soon as it pass the Liveness probe:

      livenessProbe:
        failureThreshold: 2
        httpGet:
          path: /healthz
          port: 61779
          scheme: HTTP
        initialDelaySeconds: 30
        timeoutSeconds: 10

But as we can see in the following output, LB controller's Pod is in Pending state:

[ec2-user@ip-X-X-X-X eks-cluster]$ kubectl get pods -n kube-system
NAME                                            READY   STATUS    RESTARTS   AGE
aws-load-balancer-controller-XXXXXXXXXX-p4l7f   0/1     Pending   0          30m

If Pod stays in Pending state, it means that kube-scheduler is unable to bind the Pod to a cluster node for whatever reason.

Kube-scheduler is a part of Kubernetes control plain that is responsible for assigning Pods to Nodes.

No Pod logs exist at this phase, because Pod's containers are not started yet.

The most convenient way to check the reason is using the kubectl describe command:

kubectl describe pod/podname -n namespacename

On the bottom of the output there are list of events related to the Pod life cycle. Here is an example for the generic Ubuntu Pod:

Events:
  Type    Reason     Age                From               Message
  ----    ------     ----               ----               -------
  Normal  Scheduled  37s                default-scheduler  Successfully assigned default/ubuntu to k8s-w1
  Normal  Pulling    25s (x2 over 35s)  kubelet, k8s-w1    Pulling image "ubuntu"
  Normal  Pulled     23s (x2 over 30s)  kubelet, k8s-w1    Successfully pulled image "ubuntu"
  Normal  Created    23s (x2 over 30s)  kubelet, k8s-w1    Created container ubuntu
  Normal  Started    23s (x2 over 29s)  kubelet, k8s-w1    Started container ubuntu

kubectl get events command can also show the problem. For example:

LAST SEEN   TYPE     REASON      OBJECT       MESSAGE
21s         Normal   Scheduled   pod/ubuntu   Successfully assigned default/ubuntu to k8s-w1
9s          Normal   Pulling     pod/ubuntu   Pulling image "ubuntu"
7s          Normal   Pulled      pod/ubuntu   Successfully pulled image "ubuntu"
7s          Normal   Created     pod/ubuntu   Created container ubuntu
7s          Normal   Started     pod/ubuntu   Started container ubuntu

or there could be a reason why Scheduler can't assign Pod to a Node:

"No nodes are available that match all of the predicates: Insufficient cpu (2), Insufficient memory (2)". 

In some cases errors could be found in kube-scheduler Pod logs in kube-system namespace. The logs could be listed using the following command:

kubectl logs $(kubectl get pods -l component=kube-scheduler,tier=control-plane -n kube-system -o name) -n kube-system 

Most common reasons why pod isn't scheduled are the following:

  • lack of CPU or memory resources requested by a Pod on the Nodes.
  • Pod cannot tolerate Taints on the Nodes
  • Pod have Affinity/AntiAffinity configuration that prevents it from scheduling
  • Storage or other specific resource (like GPU) requirements in Pod spec cannot be satisfied
-- VAS
Source: StackOverflow