I want to manually reroute (when I have needs) a web server (A Helm deployment on GKE) to another one.
To do that I have 3 Helm deployments :
All work fine, but if I launch a Helm update with Ingress chart changing uniquely the selector of the service I target I have 502 errors :(
Source of service :
apiVersion: v1
kind: Service
metadata:
name: {{ .Values.Service.Name }}-https
labels:
app: {{ .Values.Service.Name }}
type: svc
name: {{ .Values.Service.Name }}
environment: {{ .Values.Environment.Name }}
annotations:
cloud.google.com/neg: '{"ingress": true}'
beta.cloud.google.com/backend-config: '{"ports": {"{{ .Values.Application.Port }}":"{{ .Values.Service.Name }}-https"}}'
spec:
type: NodePort
selector:
name: {{ .Values.Application.Name }}
environment: {{ .Values.Environment.Name }}
ports:
- protocol: TCP
port: {{ .Values.Application.Port }}
targetPort: {{ .Values.Application.Port }}
---
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: {{ .Values.Service.Name }}-https
spec:
timeoutSec: 50
connectionDraining:
drainingTimeoutSec: 60
sessionAffinity:
affinityType: "GENERATED_COOKIE"
affinityCookieTtlSec: 300
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: {{ .Values.Service.Name }}-https
labels:
app: {{ .Values.Service.Name }}
type: ingress
name: {{ .Values.Service.Name }}
environment: {{ .Values.Environment.Name }}
annotations:
kubernetes.io/ingress.global-static-ip-name: {{ .Values.Service.PublicIpName }}
networking.gke.io/managed-certificates: "{{ join "," .Values.Service.DomainNames }}"
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
backend:
serviceName: {{ $.Values.Service.Name }}-https
servicePort: 80
rules:
{{- range .Values.Service.DomainNames }}
- host: {{ . | title | lower }}
http:
paths:
- backend:
serviceName: {{ $.Values.Service.Name }}-https
servicePort: 80
{{- end }}
The only thing which change from one call to another is the value of "{{ .Values.Application.Name }}", all other values are strictly the same.
Targeted PODS are always UP & RUNNING and all respond 200 using "kubectl" port forwarding test.
Here is the status of all my namespace objects :
NAME READY STATUS RESTARTS AGE
pod/drupal-dummy-404-v1-pod-744454b7ff-m4hjk 1/1 Running 0 2m32s
pod/drupal-dummy-404-v1-pod-744454b7ff-z5l29 1/1 Running 0 2m32s
pod/drupal-dummy-v1-pod-77f5bf55c6-9dq8n 1/1 Running 0 3m58s
pod/drupal-dummy-v1-pod-77f5bf55c6-njfl9 1/1 Running 0 3m57s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/drupal-dummy-v1-service-https NodePort 172.16.90.71 <none> 80:31391/TCP 3m49s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/drupal-dummy-404-v1-pod 2/2 2 2 2m32s
deployment.apps/drupal-dummy-v1-pod 2/2 2 2 3m58s
NAME DESIRED CURRENT READY AGE
replicaset.apps/drupal-dummy-404-v1-pod-744454b7ff 2 2 2 2m32s
replicaset.apps/drupal-dummy-v1-pod-77f5bf55c6 2 2 2 3m58s
NAME AGE
managedcertificate.networking.gke.io/d8.syspod.fr 161m
managedcertificate.networking.gke.io/d8gfi.syspod.fr 128m
managedcertificate.networking.gke.io/dummydrupald8.cnes.fr 162m
NAME HOSTS ADDRESS PORTS AGE
ingress.extensions/drupal-dummy-v1-service-https d8gfi.syspod.fr 34.120.106.136 80 3m50s
Another test has to pre-launch two services, one for each deployment and just update the Ingress Helm deployment changing this time "{{ $.Values.Service.Name }}", same problem, and the site indisponibility is here from 60s to 300s.
Here is the status of all my namespace objects (for this second test) :
root@47475bc8c41f:/opt/bin# k get all,svc,ingress,managedcertificates
NAME READY STATUS RESTARTS AGE
pod/drupal-dummy-404-v1-pod-744454b7ff-8r5pm 1/1 Running 0 26m
pod/drupal-dummy-404-v1-pod-744454b7ff-9cplz 1/1 Running 0 26m
pod/drupal-dummy-v1-pod-77f5bf55c6-56dnr 1/1 Running 0 30m
pod/drupal-dummy-v1-pod-77f5bf55c6-mg95j 1/1 Running 0 30m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/drupal-dummy-404-v1-pod-https NodePort 172.16.106.121 <none> 80:31030/TCP 26m
service/drupal-dummy-v1-pod-https NodePort 172.16.245.251 <none> 80:31759/TCP 27m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/drupal-dummy-404-v1-pod 2/2 2 2 26m
deployment.apps/drupal-dummy-v1-pod 2/2 2 2 30m
NAME DESIRED CURRENT READY AGE
replicaset.apps/bastion-66bb77bfd5 1 1 1 148m
replicaset.apps/drupal-dummy-404-v1-pod-744454b7ff 2 2 2 26m
replicaset.apps/drupal-dummy-v1-pod-77f5bf55c6 2 2 2 30m
NAME HOSTS ADDRESS PORTS AGE
ingress.extensions/drupal-dummy-v1-service-https d8gfi.syspod.fr 34.120.106.136 80 14m
Does anybody have any explanation (and solution) ?
Added deployment DUMP (sure something is missing but I don't see) :
root@c55834fbdf1a:/# k get deployment.apps/drupal-dummy-v1-pod -o json
{
"apiVersion": "apps/v1",
"kind": "Deployment",
"metadata": {
"annotations": {
"deployment.kubernetes.io/revision": "2",
"meta.helm.sh/release-name": "drupal-dummy-v1-pod",
"meta.helm.sh/release-namespace": "e1"
},
"creationTimestamp": "2020-06-23T18:49:59Z",
"generation": 2,
"labels": {
"app.kubernetes.io/managed-by": "Helm",
"environment": "e1",
"name": "drupal-dummy-v1-pod",
"type": "dep"
},
"name": "drupal-dummy-v1-pod",
"namespace": "e1",
"resourceVersion": "3977170",
"selfLink": "/apis/apps/v1/namespaces/e1/deployments/drupal-dummy-v1-pod",
"uid": "56f74fb9-b582-11ea-9df2-42010a000006"
},
"spec": {
"progressDeadlineSeconds": 600,
"replicas": 2,
"revisionHistoryLimit": 10,
"selector": {
"matchLabels": {
"environment": "e1",
"name": "drupal-dummy-v1-pod",
"type": "dep"
}
},
"strategy": {
"rollingUpdate": {
"maxSurge": "25%",
"maxUnavailable": "25%"
},
"type": "RollingUpdate"
},
"template": {
"metadata": {
"creationTimestamp": null,
"labels": {
"environment": "e1",
"name": "drupal-dummy-v1-pod",
"type": "dep"
}
},
"spec": {
"containers": [
{
"env": [
{
"name": "APPLICATION",
"value": "drupal-dummy-v1-pod"
},
{
"name": "DB_PASS",
"valueFrom": {
"secretKeyRef": {
"key": "password",
"name": "dbpassword"
}
}
},
{
"name": "DB_FQDN",
"valueFrom": {
"configMapKeyRef": {
"key": "dbip",
"name": "gcpenv"
}
}
},
{
"name": "DB_PORT",
"valueFrom": {
"configMapKeyRef": {
"key": "dbport",
"name": "gcpenv"
}
}
},
{
"name": "DB_NAME",
"valueFrom": {
"configMapKeyRef": {
"key": "dbdatabase",
"name": "gcpenv"
}
}
},
{
"name": "DB_USER",
"valueFrom": {
"configMapKeyRef": {
"key": "dbuser",
"name": "gcpenv"
}
}
}
],
"image": "eu.gcr.io/gke-drupal-276313/drupal-dummy:1.0.0",
"imagePullPolicy": "Always",
"livenessProbe": {
"failureThreshold": 3,
"httpGet": {
"path": "/",
"port": 80,
"scheme": "HTTP"
},
"initialDelaySeconds": 60,
"periodSeconds": 10,
"successThreshold": 1,
"timeoutSeconds": 5
},
"name": "drupal-dummy-v1-pod",
"ports": [
{
"containerPort": 80,
"protocol": "TCP"
}
],
"readinessProbe": {
"failureThreshold": 3,
"httpGet": {
"path": "/",
"port": 80,
"scheme": "HTTP"
},
"initialDelaySeconds": 60,
"periodSeconds": 10,
"successThreshold": 1,
"timeoutSeconds": 5
},
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/var/www/html/sites/default",
"name": "drupal-dummy-v1-pod"
}
]
}
],
"dnsPolicy": "ClusterFirst",
"restartPolicy": "Always",
"schedulerName": "default-scheduler",
"securityContext": {},
"terminationGracePeriodSeconds": 30,
"volumes": [
{
"name": "drupal-dummy-v1-pod",
"persistentVolumeClaim": {
"claimName": "drupal-dummy-v1-pod"
}
}
]
}
}
},
"status": {
"availableReplicas": 2,
"conditions": [
{
"lastTransitionTime": "2020-06-23T18:56:05Z",
"lastUpdateTime": "2020-06-23T18:56:05Z",
"message": "Deployment has minimum availability.",
"reason": "MinimumReplicasAvailable",
"status": "True",
"type": "Available"
},
{
"lastTransitionTime": "2020-06-23T18:49:59Z",
"lastUpdateTime": "2020-06-23T18:56:05Z",
"message": "ReplicaSet \"drupal-dummy-v1-pod-6865d969cd\" has successfully progressed.",
"reason": "NewReplicaSetAvailable",
"status": "True",
"type": "Progressing"
}
],
"observedGeneration": 2,
"readyReplicas": 2,
"replicas": 2,
"updatedReplicas": 2
}
}
Here service DUMP too :
root@c55834fbdf1a:/# k get service/drupal-dummy-v1-service-https -o json
{
"apiVersion": "v1",
"kind": "Service",
"metadata": {
"annotations": {
"beta.cloud.google.com/backend-config": "{\"ports\": {\"80\":\"drupal-dummy-v1-service-https\"}}",
"cloud.google.com/neg": "{\"ingress\": true}",
"cloud.google.com/neg-status": "{\"network_endpoint_groups\":{\"80\":\"k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551\"},\"zones\":[\"europe-west3-a\",\"europe-west3-b\"]}",
"meta.helm.sh/release-name": "drupal-dummy-v1-service",
"meta.helm.sh/release-namespace": "e1"
},
"creationTimestamp": "2020-06-23T18:50:45Z",
"labels": {
"app": "drupal-dummy-v1-service",
"app.kubernetes.io/managed-by": "Helm",
"environment": "e1",
"name": "drupal-dummy-v1-service",
"type": "svc"
},
"name": "drupal-dummy-v1-service-https",
"namespace": "e1",
"resourceVersion": "3982781",
"selfLink": "/api/v1/namespaces/e1/services/drupal-dummy-v1-service-https",
"uid": "722d3a99-b582-11ea-9df2-42010a000006"
},
"spec": {
"clusterIP": "172.16.103.181",
"externalTrafficPolicy": "Cluster",
"ports": [
{
"nodePort": 32396,
"port": 80,
"protocol": "TCP",
"targetPort": 80
}
],
"selector": {
"environment": "e1",
"name": "drupal-dummy-v1-pod"
},
"sessionAffinity": "None",
"type": "NodePort"
},
"status": {
"loadBalancer": {}
}
}
And ingress one :
root@c55834fbdf1a:/# k get ingress.extensions/drupal-dummy-v1-service-https -o json
{
"apiVersion": "extensions/v1beta1",
"kind": "Ingress",
"metadata": {
"annotations": {
"ingress.gcp.kubernetes.io/pre-shared-cert": "mcrt-a15e339b-6c3f-4f23-8f6b-688dc98b33a6,mcrt-f3a385de-0541-4b9c-8047-6dcfcbd4d74f",
"ingress.kubernetes.io/backends": "{\"k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551\":\"HEALTHY\"}",
"ingress.kubernetes.io/forwarding-rule": "k8s-fw-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
"ingress.kubernetes.io/https-forwarding-rule": "k8s-fws-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
"ingress.kubernetes.io/https-target-proxy": "k8s-tps-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
"ingress.kubernetes.io/ssl-cert": "mcrt-a15e339b-6c3f-4f23-8f6b-688dc98b33a6,mcrt-f3a385de-0541-4b9c-8047-6dcfcbd4d74f",
"ingress.kubernetes.io/target-proxy": "k8s-tp-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
"ingress.kubernetes.io/url-map": "k8s-um-e1-drupal-dummy-v1-service-https--4846660e8b9bd880",
"kubernetes.io/ingress.global-static-ip-name": "gkxe-k1312-e1-drupal-dummy-v1",
"meta.helm.sh/release-name": "drupal-dummy-v1-service",
"meta.helm.sh/release-namespace": "e1",
"networking.gke.io/managed-certificates": "dummydrupald8.cnes.fr,d8.syspod.fr",
"nginx.ingress.kubernetes.io/rewrite-target": "/"
},
"creationTimestamp": "2020-06-23T18:50:45Z",
"generation": 1,
"labels": {
"app": "drupal-dummy-v1-service",
"app.kubernetes.io/managed-by": "Helm",
"environment": "e1",
"name": "drupal-dummy-v1-service",
"type": "ingress"
},
"name": "drupal-dummy-v1-service-https",
"namespace": "e1",
"resourceVersion": "3978178",
"selfLink": "/apis/extensions/v1beta1/namespaces/e1/ingresses/drupal-dummy-v1-service-https",
"uid": "7237fc51-b582-11ea-9df2-42010a000006"
},
"spec": {
"backend": {
"serviceName": "drupal-dummy-v1-service-https",
"servicePort": 80
},
"rules": [
{
"host": "dummydrupald8.cnes.fr",
"http": {
"paths": [
{
"backend": {
"serviceName": "drupal-dummy-v1-service-https",
"servicePort": 80
}
}
]
}
},
{
"host": "d8.syspod.fr",
"http": {
"paths": [
{
"backend": {
"serviceName": "drupal-dummy-v1-service-https",
"servicePort": 80
}
}
]
}
}
]
},
"status": {
"loadBalancer": {
"ingress": [
{
"ip": "34.98.97.102"
}
]
}
}
}
I have seen that in kubernetes events (only when I reconfigure my service selector to target first or second deployment).
Switch to indisponibility page (3 seconds 502) :
81s Normal Attach service/drupal-dummy-v1-service-https Attach 1 network endpoint(s) (NEG "k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551" in zone "europe-west3-b")
78s Normal Attach service/drupal-dummy-v1-service-https Attach 1 network endpoint(s) (NEG "k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551" in zone "europe-west3-a")
Switch back to application (15 seconds 502 -> Never the same duration):
7s Normal Attach service/drupal-dummy-v1-service-https Attach 1 network endpoint(s) (NEG "k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551" in zone "europe-west3-a")
7s Normal Attach service/drupal-dummy-v1-service-https Attach 1 network endpoint(s) (NEG "k8s1-4846660e-e1-drupal-dummy-v1-service-https-80-36c11551" in zone "europe-west3-b")
I could check that the NEG events appear just before 502 error ends, I suspect that when we change service definition a new NEG is implemented but the time is not immediate, and while we wait to have it we still not have the old service up and there is no service during this time :(
There is no "rolling update" of services definition ?
No solution at all, even with having two services and just upgrading the Ingress. Seen witn GCP support, GKE will always destroy then recretate something it is not possible to do any rolling update of service or ingress with no downtime. They suggest to have two full silos then play with DNS, we have chosen another solution, having only one deployment and just do a simple rolling update of deployment changing referenced docker image. Not really in the target, but it works...