I'm running into an issue:
Getting a health check to succeed for a .Net app running in an IIS Container when trying to use Container Native Load Balancing(CNLB).
I have a Network Endpoint Group(NEG) created by an Ingress resource definition in GKE with a VPC Native Cluster.
When I circumvent CNLB by either exposing the NodePort or making a service of type LoadBalancer, the site resolves without issue.
All the pod conditions from a describe look good: pod readiness
The network endpoints show up when running describe endpoints
: ready addresses
This is the health check that is generated by the load balancer: GCP Health Check
When hitting these endpoints from other containers or VMs in the same VPC, /health.htm responds with a 200. Here's from a container in the same namespace, though I have reproduced this with a Linux VM, not in the cluster but in the same VPC: endpoint responds
But in spite of it all, the health check is reporting the pods in my NEG unhealthy: Unhealthy Endpoints
The stackdriver logs confirm the requests are timing out but I'm not sure why when the endpoints are responding to other instances but not the LB: Stackdriver Health Check Log
And I confirmed that GKE created what looks like the correct firewall rule that should allow traffic from the LB to the pods: firewall
Here is the YAML I'm working with:
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: subdomain.domain.tld
name: subdomain-domain-tld
namespace: subdomain-domain-tld
spec:
replicas: 3
selector:
matchLabels:
app: subdomain.domain.tld
template:
metadata:
labels:
app: subdomain.domain.tld
spec:
containers:
- image: gcr.io/ourrepo/ourimage
name: subdomain-domain-tld
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /health.htm
port: 80
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 10
volumeMounts:
- mountPath: C:\some-secrets
name: some-secrets
nodeSelector:
kubernetes.io/os: windows
volumes:
- name: some-secrets
secret:
secretName: some-secrets
Service:
apiVersion: v1
kind: Service
metadata:
labels:
app: subdomain.domain.tld
name: subdomain-domain-tld-service
namespace: subdomain-domain-tld
spec:
ports:
- port: 80
targetPort: 80
selector:
app: subdomain.domain.tld
type: NodePort
Ingress is extremely basic as we have no real need for multiple routes on this site, however, I'm suspecting whatever issues we're having are here.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: gce
labels:
app: subdomain.domain.tld
name: subdomain-domain-tld-ingress
namespace: subdomain-domain-tld
spec:
backend:
serviceName: subdomain-domain-tld-service
servicePort: 80
Last somewhat relevant detail is I tried the steps present in this documentation and it worked but it's not identical to my situation as its not using Windows Containers nor Readiness Probes: https://cloud.google.com/kubernetes-engine/docs/how-to/container-native-load-balancing#using-pod-readiness-feedback
Any suggestions would be greatly appreciated. I've spent two days stuck on this and I'm sure it's obvious but I just can't see the problem.
When you create an Ingress, the generated HC probes will default to performing HealthCheck on the same serving port and Path as the app. in this case, port 80 on Path /
Seems like your app report it's healthCheck on port 80 but on the /health.htm path.
You will need to add a custom healthCheck via the BackendConfig CRD. Have a look at this link 1. You can find in the same Page how to associate the BackendConfig to the Ingress
What version of GKE are you on? Seems like an old version judging from the Ingress API you use.
1https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#direct_health
You can refer to this GCP document. Note: This feature is not supported with Windows Server node pools.
Feature limitations There are some Kubernetes features that are not yet supported for Windows Server containers. In addition, some features are Linux-specific and do not work for Windows. For the complete list of supported and unsupported Kubernetes features, see the Kubernetes documentation.
In addition to the unsupported Kubernetes features, there are some GKE features that are not supported.
For GKE clusters, the following features are not supported with Windows Server node pools:
Cloud TPUs (--enable-tpu) Image streaming Ingress with Network Endpoint Groups Intranode visibility (--enable-intra-node-visibility) IP masquerade agent Kubernetes alpha cluster (--enable-kubernetes-alpha) Node Local DNS cache Private use of Class E IP addresses Private use of public IP addresses Network policy logging Kubernetes service.spec.sessionAffinity Spot VMs GPUs (--accelerator)
https://cloud.google.com/kubernetes-engine/docs/concepts/windows-server-gke https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#container-native_load_balancing
Apparently it's not documented but this functionality doesn't work with Windows containers at the time of writing. I was able to get in touch with a GCP Engineer and they provided the following:
After further investigation, I have found that Windows containers using LoadBalancer service works but, Windows containers using Ingress with NEGS is a limitation so, I have opened an internal case for updating the public documentation 1.
Since, Ingress + NEG will not work (per the limitation), I suggest you to use any option you mentioned either exposing the NodePort or making a service of type LoadBalancer.