I'm running into an issue:
Getting a health check to succeed for a .Net app running in an IIS Container when trying to use Container Native Load Balancing(CNLB).
I have a Network Endpoint Group(NEG) created by an Ingress resource definition in GKE with a VPC Native Cluster.
When I circumvent CNLB by either exposing the NodePort or making a service of type LoadBalancer, the site resolves without issue.
All the pod conditions from a describe look good: pod readiness
The network endpoints show up when running describe endpoints: ready addresses
This is the health check that is generated by the load balancer: GCP Health Check
When hitting these endpoints from other containers or VMs in the same VPC, /health.htm responds with a 200. Here's from a container in the same namespace, though I have reproduced this with a Linux VM, not in the cluster but in the same VPC: endpoint responds
But in spite of it all, the health check is reporting the pods in my NEG unhealthy: Unhealthy Endpoints
The stackdriver logs confirm the requests are timing out but I'm not sure why when the endpoints are responding to other instances but not the LB: Stackdriver Health Check Log
And I confirmed that GKE created what looks like the correct firewall rule that should allow traffic from the LB to the pods: firewall
Here is the YAML I'm working with:
Deployment:
apiVersion: apps/v1                                                  
kind: Deployment                                                     
metadata:                                                            
  labels:                                                            
    app: subdomain.domain.tld                                       
  name: subdomain-domain-tld                                       
  namespace: subdomain-domain-tld
spec:                                                                
  replicas: 3                                                        
  selector:                                                          
    matchLabels:                                                     
      app: subdomain.domain.tld                                     
  template:                                                          
    metadata:                                                        
      labels:                                                        
        app: subdomain.domain.tld
    spec:                                                            
      containers:                                                    
      - image: gcr.io/ourrepo/ourimage
        name: subdomain-domain-tld
        ports:                                                       
        - containerPort: 80                                          
        readinessProbe:                                              
          httpGet:                                                   
            path: /health.htm                                        
            port: 80                                                 
          initialDelaySeconds: 60                                    
          periodSeconds: 60                                          
          timeoutSeconds: 10                                         
        volumeMounts:                                                
        - mountPath: C:\some-secrets                                      
          name: some-secrets
      nodeSelector:                                                  
        kubernetes.io/os: windows                                    
      volumes:                                                       
      - name: some-secrets                                    
        secret:                                                      
          secretName: some-secretsService:
apiVersion: v1                                                       
kind: Service                                                        
metadata:                                                            
  labels:                                                            
    app: subdomain.domain.tld                                     
  name: subdomain-domain-tld-service
  namespace: subdomain-domain-tld
spec:                                                                
  ports:                                                             
  - port: 80                                                         
    targetPort: 80                                                   
  selector:                                                          
    app: subdomain.domain.tld                                       
  type: NodePort                 Ingress is extremely basic as we have no real need for multiple routes on this site, however, I'm suspecting whatever issues we're having are here.
apiVersion: extensions/v1beta1                                       
kind: Ingress                                                        
metadata:                                                            
  annotations:                                                       
    kubernetes.io/ingress.class: gce
  labels:                                                            
    app: subdomain.domain.tld                                       
  name: subdomain-domain-tld-ingress
  namespace: subdomain-domain-tld
spec:                                                                
  backend:                                                           
    serviceName: subdomain-domain-tld-service
    servicePort: 80Last somewhat relevant detail is I tried the steps present in this documentation and it worked but it's not identical to my situation as its not using Windows Containers nor Readiness Probes: https://cloud.google.com/kubernetes-engine/docs/how-to/container-native-load-balancing#using-pod-readiness-feedback
Any suggestions would be greatly appreciated. I've spent two days stuck on this and I'm sure it's obvious but I just can't see the problem.
When you create an Ingress, the generated HC probes will default to performing HealthCheck on the same serving port and Path as the app. in this case, port 80 on Path /
Seems like your app report it's healthCheck on port 80 but on the /health.htm path.
You will need to add a custom healthCheck via the BackendConfig CRD. Have a look at this link 1. You can find in the same Page how to associate the BackendConfig to the Ingress
What version of GKE are you on? Seems like an old version judging from the Ingress API you use.
1https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#direct_health
You can refer to this GCP document. Note: This feature is not supported with Windows Server node pools.
Feature limitations There are some Kubernetes features that are not yet supported for Windows Server containers. In addition, some features are Linux-specific and do not work for Windows. For the complete list of supported and unsupported Kubernetes features, see the Kubernetes documentation.
In addition to the unsupported Kubernetes features, there are some GKE features that are not supported.
For GKE clusters, the following features are not supported with Windows Server node pools:
Cloud TPUs (--enable-tpu) Image streaming Ingress with Network Endpoint Groups Intranode visibility (--enable-intra-node-visibility) IP masquerade agent Kubernetes alpha cluster (--enable-kubernetes-alpha) Node Local DNS cache Private use of Class E IP addresses Private use of public IP addresses Network policy logging Kubernetes service.spec.sessionAffinity Spot VMs GPUs (--accelerator)
https://cloud.google.com/kubernetes-engine/docs/concepts/windows-server-gke https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#container-native_load_balancing
Apparently it's not documented but this functionality doesn't work with Windows containers at the time of writing. I was able to get in touch with a GCP Engineer and they provided the following:
After further investigation, I have found that Windows containers using LoadBalancer service works but, Windows containers using Ingress with NEGS is a limitation so, I have opened an internal case for updating the public documentation 1.
Since, Ingress + NEG will not work (per the limitation), I suggest you to use any option you mentioned either exposing the NodePort or making a service of type LoadBalancer.