When updating service target I have 502 errors during 5 to 60 seconds

6/19/2020

I want to manually reroute (when I have needs) a web server (A Helm deployment on GKE) to another one.

To do that I have 3 Helm deployments :

  • Application X
  • Application Y
  • Ingress on application X

All work fine, but if I launch a Helm update with Ingress chart changing uniquely the selector of the service I target I have 502 errors :(

Source of service :

apiVersion: v1
kind: Service
metadata:
  name: {{ .Values.Service.Name }}-https
  labels:
    app: {{ .Values.Service.Name }}
    type: svc
    name: {{ .Values.Service.Name }}
    environment: {{ .Values.Environment.Name }}
  annotations:
    cloud.google.com/neg: '{"ingress": true}'
    beta.cloud.google.com/backend-config: '{"ports": {"{{ .Values.Application.Port }}":"{{ .Values.Service.Name }}-https"}}'
spec:
  type: NodePort
  selector:
    name: {{ .Values.Application.Name }}
    environment: {{ .Values.Environment.Name }}
  ports:
    - protocol: TCP
      port: {{ .Values.Application.Port }}
      targetPort: {{ .Values.Application.Port }}
---
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
  name: {{ .Values.Service.Name }}-https
spec:
  timeoutSec: 50
  connectionDraining:
    drainingTimeoutSec: 60
  sessionAffinity:
    affinityType: "GENERATED_COOKIE"
    affinityCookieTtlSec: 300
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: {{ .Values.Service.Name }}-https
  labels:
    app: {{ .Values.Service.Name }}
    type: ingress
    name: {{ .Values.Service.Name }}
    environment: {{ .Values.Environment.Name }}
  annotations:
    kubernetes.io/ingress.global-static-ip-name: {{ .Values.Service.PublicIpName }}
    networking.gke.io/managed-certificates: "{{ join "," .Values.Service.DomainNames }}"
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  backend:
    serviceName: {{ $.Values.Service.Name }}-https
    servicePort: 80
  rules:
  {{- range .Values.Service.DomainNames }}
    - host: {{ . | title | lower }}
      http:
        paths:
          - backend:
              serviceName: {{ $.Values.Service.Name }}-https
              servicePort: 80
  {{- end }}

The only thing which change from one call to another is the value of "{{ .Values.Application.Name }}", all other values are strictly the same.

Targeted PODS are always UP & RUNNING and all respond 200 using "kubectl" port forwarding test.

Here is the status of all my namespace objects :

NAME                                           READY   STATUS    RESTARTS   AGE
pod/drupal-dummy-404-v1-pod-744454b7ff-m4hjk   1/1     Running   0          2m32s
pod/drupal-dummy-404-v1-pod-744454b7ff-z5l29   1/1     Running   0          2m32s
pod/drupal-dummy-v1-pod-77f5bf55c6-9dq8n       1/1     Running   0          3m58s
pod/drupal-dummy-v1-pod-77f5bf55c6-njfl9       1/1     Running   0          3m57s

NAME                                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/drupal-dummy-v1-service-https   NodePort    172.16.90.71    <none>        80:31391/TCP                 3m49s

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/drupal-dummy-404-v1-pod   2/2     2            2           2m32s
deployment.apps/drupal-dummy-v1-pod       2/2     2            2           3m58s

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/drupal-dummy-404-v1-pod-744454b7ff   2         2         2       2m32s
replicaset.apps/drupal-dummy-v1-pod-77f5bf55c6       2         2         2       3m58s

NAME                                                         AGE
managedcertificate.networking.gke.io/d8.syspod.fr            161m
managedcertificate.networking.gke.io/d8gfi.syspod.fr         128m
managedcertificate.networking.gke.io/dummydrupald8.cnes.fr   162m

NAME                                               HOSTS                                                ADDRESS          PORTS   AGE
ingress.extensions/drupal-dummy-v1-service-https   d8gfi.syspod.fr   34.120.106.136   80      3m50s

Another test has to pre-launch two services, one for each deployment and just update the Ingress Helm deployment changing this time "{{ $.Values.Service.Name }}", same problem, and the site indisponibility is here from 60s to 300s.

Here is the status of all my namespace objects (for this second test) :

root@47475bc8c41f:/opt/bin# k get all,svc,ingress,managedcertificates
NAME                                           READY   STATUS    RESTARTS   AGE
pod/drupal-dummy-404-v1-pod-744454b7ff-8r5pm   1/1     Running   0          26m
pod/drupal-dummy-404-v1-pod-744454b7ff-9cplz   1/1     Running   0          26m
pod/drupal-dummy-v1-pod-77f5bf55c6-56dnr       1/1     Running   0          30m
pod/drupal-dummy-v1-pod-77f5bf55c6-mg95j       1/1     Running   0          30m

NAME                                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/drupal-dummy-404-v1-pod-https   NodePort    172.16.106.121   <none>        80:31030/TCP                 26m
service/drupal-dummy-v1-pod-https       NodePort    172.16.245.251   <none>        80:31759/TCP                 27m

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/drupal-dummy-404-v1-pod   2/2     2            2           26m
deployment.apps/drupal-dummy-v1-pod       2/2     2            2           30m

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/bastion-66bb77bfd5                   1         1         1       148m
replicaset.apps/drupal-dummy-404-v1-pod-744454b7ff   2         2         2       26m
replicaset.apps/drupal-dummy-v1-pod-77f5bf55c6       2         2         2       30m

NAME                                               HOSTS                                                ADDRESS          PORTS   AGE
ingress.extensions/drupal-dummy-v1-service-https   d8gfi.syspod.fr   34.120.106.136   80      14m

Does anybody have any explanation (and solution) ?


Added deployment DUMP (sure something is missing but I don't see) :

root@c55834fbdf1a:/# k get deployment.apps/drupal-dummy-v1-pod -o json
{
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "metadata": {
        "annotations": {
            "deployment.kubernetes.io/revision": "2",
            "meta.helm.sh/release-name": "drupal-dummy-v1-pod",
            "meta.helm.sh/release-namespace": "e1"
        },
        "creationTimestamp": "2020-06-23T18:49:59Z",
        "generation": 2,
        "labels": {
            "app.kubernetes.io/managed-by": "Helm",
            "environment": "e1",
            "name": "drupal-dummy-v1-pod",
            "type": "dep"
        },
        "name": "drupal-dummy-v1-pod",
        "namespace": "e1",
        "resourceVersion": "3977170",
        "selfLink": "/apis/apps/v1/namespaces/e1/deployments/drupal-dummy-v1-pod",
        "uid": "56f74fb9-b582-11ea-9df2-42010a000006"
    },
    "spec": {
        "progressDeadlineSeconds": 600,
        "replicas": 2,
        "revisionHistoryLimit": 10,
        "selector": {
            "matchLabels": {
                "environment": "e1",
                "name": "drupal-dummy-v1-pod",
                "type": "dep"
            }
        },
        "strategy": {
            "rollingUpdate": {
                "maxSurge": "25%",
                "maxUnavailable": "25%"
            },
            "type": "RollingUpdate"
        },
        "template": {
            "metadata": {
                "creationTimestamp": null,
                "labels": {
                    "environment": "e1",
                    "name": "drupal-dummy-v1-pod",
                    "type": "dep"
                }
            },
            "spec": {
                "containers": [
                    {
                        "env": [
                            {
                                "name": "APPLICATION",
                                "value": "drupal-dummy-v1-pod"
                            },
                            {
                                "name": "DB_PASS",
                                "valueFrom": {
                                    "secretKeyRef": {
                                        "key": "password",
                                        "name": "dbpassword"
                                    }
                                }
                            },
                            {
                                "name": "DB_FQDN",
                                "valueFrom": {
                                    "configMapKeyRef": {
                                        "key": "dbip",
                                        "name": "gcpenv"
                                    }
                                }
                            },
                            {
                                "name": "DB_PORT",
                                "valueFrom": {
                                    "configMapKeyRef": {
                                        "key": "dbport",
                                        "name": "gcpenv"
                                    }
                                }
                            },
                            {
                                "name": "DB_NAME",
                                "valueFrom": {
                                    "configMapKeyRef": {
                                        "key": "dbdatabase",
                                        "name": "gcpenv"
                                    }
                                }
                            },
                            {
                                "name": "DB_USER",
                                "valueFrom": {
                                    "configMapKeyRef": {
                                        "key": "dbuser",
                                        "name": "gcpenv"
                                    }
                                }
                            }
                        ],
                        "image": "eu.gcr.io/gke-drupal-276313/drupal-dummy:1.0.0",
                        "imagePullPolicy": "Always",
                        "livenessProbe": {
                            "failureThreshold": 3,
                            "httpGet": {
                                "path": "/",
                                "port": 80,
                                "scheme": "HTTP"
                            },
                            "initialDelaySeconds": 60,
                            "periodSeconds": 10,
                            "successThreshold": 1,
                            "timeoutSeconds": 5
                        },
                        "name": "drupal-dummy-v1-pod",
                        "ports": [
                            {
                                "containerPort": 80,
                                "protocol": "TCP"
                            }
                        ],
                        "readinessProbe": {
                            "failureThreshold": 3,
                            "httpGet": {
                                "path": "/",
                                "port": 80,
                                "scheme": "HTTP"
                            },
                            "initialDelaySeconds": 60,
                            "periodSeconds": 10,
                            "successThreshold": 1,
                            "timeoutSeconds": 5
                        },
                        "resources": {},
                        "terminationMessagePath": "/dev/termination-log",
                        "terminationMessagePolicy": "File",
                        "volumeMounts": [
                            {
                                "mountPath": "/var/www/html/sites/default",
                                "name": "drupal-dummy-v1-pod"
                            }
                        ]
                    }
                ],
                "dnsPolicy": "ClusterFirst",
                "restartPolicy": "Always",
                "schedulerName": "default-scheduler",
                "securityContext": {},
                "terminationGracePeriodSeconds": 30,
                "volumes": [
                    {
                        "name": "drupal-dummy-v1-pod",
                        "persistentVolumeClaim": {
                            "claimName": "drupal-dummy-v1-pod"
                        }
                    }
                ]
            }
        }
    },
    "status": {
        "availableReplicas": 2,
        "conditions": [
            {
                "lastTransitionTime": "2020-06-23T18:56:05Z",
                "lastUpdateTime": "2020-06-23T18:56:05Z",
                "message": "Deployment has minimum availability.",
                "reason": "MinimumReplicasAvailable",
                "status": "True",
                "type": "Available"
            },
            {
                "lastTransitionTime": "2020-06-23T18:49:59Z",
                "lastUpdateTime": "2020-06-23T18:56:05Z",
                "message": "ReplicaSet \"drupal-dummy-v1-pod-6865d969cd\" has successfully progressed.",
                "reason": "NewReplicaSetAvailable",
                "status": "True",
                "type": "Progressing"
            }
        ],
        "observedGeneration": 2,
        "readyReplicas": 2,
        "replicas": 2,
        "updatedReplicas": 2
    }
}

Here service DUMP too :

root@c55834fbdf1a:/# k get service/drupal-dummy-v1-service-https -o json
{
    "apiVersion": "v1",
    "kind": "Service",
    "metadata": {
        "annotations": {
            "beta.cloud.google.com/backend-config": "{\"ports\": {\"80\":\"drupal-dummy-v1-service-https\"}}",
            "cloud.google.com/neg": "{\"ingress\": true}",
            "cloud.google.com/neg-status": "{\"network_endpoint_groups\":{\"80\":\"k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551\"},\"zones\":[\"europe-west3-a\",\"europe-west3-b\"]}",
            "meta.helm.sh/release-name": "drupal-dummy-v1-service",
            "meta.helm.sh/release-namespace": "e1"
        },
        "creationTimestamp": "2020-06-23T18:50:45Z",
        "labels": {
            "app": "drupal-dummy-v1-service",
            "app.kubernetes.io/managed-by": "Helm",
            "environment": "e1",
            "name": "drupal-dummy-v1-service",
            "type": "svc"
        },
        "name": "drupal-dummy-v1-service-https",
        "namespace": "e1",
        "resourceVersion": "3982781",
        "selfLink": "/api/v1/namespaces/e1/services/drupal-dummy-v1-service-https",
        "uid": "722d3a99-b582-11ea-9df2-42010a000006"
    },
    "spec": {
        "clusterIP": "172.16.103.181",
        "externalTrafficPolicy": "Cluster",
        "ports": [
            {
                "nodePort": 32396,
                "port": 80,
                "protocol": "TCP",
                "targetPort": 80
            }
        ],
        "selector": {
            "environment": "e1",
            "name": "drupal-dummy-v1-pod"
        },
        "sessionAffinity": "None",
        "type": "NodePort"
    },
    "status": {
        "loadBalancer": {}
    }
}

And ingress one :

root@c55834fbdf1a:/# k get ingress.extensions/drupal-dummy-v1-service-https -o json
{
    "apiVersion": "extensions/v1beta1",
    "kind": "Ingress",
    "metadata": {
        "annotations": {
            "ingress.gcp.kubernetes.io/pre-shared-cert": "mcrt-a15e339b-6c3f-4f23-8f6b-688dc98b33a6,mcrt-f3a385de-0541-4b9c-8047-6dcfcbd4d74f",
            "ingress.kubernetes.io/backends": "{\"k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551\":\"HEALTHY\"}",
            "ingress.kubernetes.io/forwarding-rule": "k8s-fw-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
            "ingress.kubernetes.io/https-forwarding-rule": "k8s-fws-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
            "ingress.kubernetes.io/https-target-proxy": "k8s-tps-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
            "ingress.kubernetes.io/ssl-cert": "mcrt-a15e339b-6c3f-4f23-8f6b-688dc98b33a6,mcrt-f3a385de-0541-4b9c-8047-6dcfcbd4d74f",
            "ingress.kubernetes.io/target-proxy": "k8s-tp-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
            "ingress.kubernetes.io/url-map": "k8s-um-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
            "kubernetes.io/ingress.global-static-ip-name": "gkxe-k1312-e1-drupal-dummy-v1",
            "meta.helm.sh/release-name": "drupal-dummy-v1-service",
            "meta.helm.sh/release-namespace": "e1",
            "networking.gke.io/managed-certificates": "dummydrupald8.cnes.fr,d8.syspod.fr",
            "nginx.ingress.kubernetes.io/rewrite-target": "/"
        },
        "creationTimestamp": "2020-06-23T18:50:45Z",
        "generation": 1,
        "labels": {
            "app": "drupal-dummy-v1-service",
            "app.kubernetes.io/managed-by": "Helm",
            "environment": "e1",
            "name": "drupal-dummy-v1-service",
            "type": "ingress"
        },
        "name": "drupal-dummy-v1-service-https",
        "namespace": "e1",
        "resourceVersion": "3978178",
        "selfLink": "/apis/extensions/v1beta1/namespaces/e1/ingresses/drupal-dummy-v1-service-https",
        "uid": "7237fc51-b582-11ea-9df2-42010a000006"
    },
    "spec": {
        "backend": {
            "serviceName": "drupal-dummy-v1-service-https",
            "servicePort": 80
        },
        "rules": [
            {
                "host": "dummydrupald8.cnes.fr",
                "http": {
                    "paths": [
                        {
                            "backend": {
                                "serviceName": "drupal-dummy-v1-service-https",
                                "servicePort": 80
                            }
                        }
                    ]
                }
            },
            {
                "host": "d8.syspod.fr",
                "http": {
                    "paths": [
                        {
                            "backend": {
                                "serviceName": "drupal-dummy-v1-service-https",
                                "servicePort": 80
                            }
                        }
                    ]
                }
            }
        ]
    },
    "status": {
        "loadBalancer": {
            "ingress": [
                {
                    "ip": "34.98.97.102"
                }
            ]
        }
    }
}

I have seen that in kubernetes events (only when I reconfigure my service selector to target first or second deployment).

Switch to indisponibility page (3 seconds 502) :

81s         Normal    Attach                    service/drupal-dummy-v1-service-https           Attach 1 network endpoint(s) (NEG "k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551" in zone "europe-west3-b")
78s         Normal    Attach                    service/drupal-dummy-v1-service-https           Attach 1 network endpoint(s) (NEG "k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551" in zone "europe-west3-a")

Switch back to application (15 seconds 502 -> Never the same duration):

7s          Normal    Attach                    service/drupal-dummy-v1-service-https           Attach 1 network endpoint(s) (NEG "k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551" in zone "europe-west3-a")
7s          Normal    Attach                    service/drupal-dummy-v1-service-https           Attach 1 network endpoint(s) (NEG "k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551" in zone "europe-west3-b")

I could check that the NEG events appear just before 502 error ends, I suspect that when we change service definition a new NEG is implemented but the time is not immediate, and while we wait to have it we still not have the old service up and there is no service during this time :(

There is no "rolling update" of services definition ?

-- Philippe Cerou
google-kubernetes-engine
kubernetes
service

1 Answer

7/3/2020

No solution at all, even with having two services and just upgrading the Ingress. Seen witn GCP support, GKE will always destroy then recretate something it is not possible to do any rolling update of service or ingress with no downtime. They suggest to have two full silos then play with DNS, we have chosen another solution, having only one deployment and just do a simple rolling update of deployment changing referenced docker image. Not really in the target, but it works...

-- Philippe Cerou
Source: StackOverflow