I am trying to deploy a service in Kubernetes available through a network load balancer. I am aware this is an alpha feature at the moment, but I am running some tests. I have a deployment definition that is working fine as is. My service definition without the nlb annotation looks something like this and is working fine:
kind: Service
apiVersion: v1
metadata:
name: service1
annotations:
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
spec:
type: LoadBalancer
selector:
app: some-app
ports:
- port: 80
protocol: TCP
However, when I switch to NLB, even when the load balancer is created and configured "correctly", the target in the AWS target group always appears unhealthy and I cannot access the service via HTTP. This is the service definition:
kind: Service
apiVersion: v1
metadata:
name: service1
annotations:
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
type: LoadBalancer
selector:
app: some-app
ports:
- port: 80
protocol: TCP
externalTrafficPolicy: Local
I don't think NLB is the problem.
externalTrafficPolicy: Local
is not supported by kops on AWS, and there are issues with some other K8s distros that run on AWS, due to some AWS limitation.
Try changing it to
externalTrafficPolicy: Cluster
There's an issue with the source IP being that of the load balancer instead of the true external client that can be worked around by using proxy protocol annotation on the service + adding some configuration to the ingress controller.
However, there is a 2nd issue that while you can technically hack your way around it and force it to work, it's usually not worth bothering.
externalTrafficPolicy: Local
Creates a NodePort /healthz endpoint so the LB sends traffic to a subset of nodes with service endpoints instead of all worker nodes. It's broken on initial provisioning and the reconciliation loop is broken as well.
https://github.com/kubernetes/kubernetes/issues/80579
^describes the problem in more depth.
https://github.com/kubernetes/kubernetes/issues/61486
^describes a workaround to force it to work using a kops hook
but honestly, you should just stick to externalTrafficPolicy: Cluster as it's always more stable.
There was a bug in the NLB security groups implementation. It's fixed in 1.11.7, 1.12.5, and probably the next 1.13 patch.
It seems there was a rule missing in the k8s nodes security group, since the NLB forwards the client IP.