I'm working on a Kubernetes cluster where I am directing service from GCloud Ingress to my Services. One of the services endpoints fails health check as HTTP but passes it as TCP.
When I change the health check options inside GCloud to be TCP, the health checks pass, and my endpoint works, but after a few minutes, the health check on GCloud resets for that port back to HTTP and health checks fail again, giving me a 502 response on my endpoint.
I don't know if it's a bug inside Google Cloud or something I'm doing wrong in Kubernetes. I have pasted my YAML configuration here:
namespace
apiVersion: v1
kind: Namespace
metadata:
name: parity
labels:
name: parity
storageclass
apiVersion: storage.k8s.io/v1
metadata:
name: classic-ssd
namespace: parity
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
zones: us-central1-a
reclaimPolicy: Retain
secret
apiVersion: v1
kind: Secret
metadata:
name: tls-secret
namespace: ingress-nginx
data:
tls.crt: ./config/redacted.crt
tls.key: ./config/redacted.key
statefulset
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: parity
namespace: parity
labels:
app: parity
spec:
replicas: 3
selector:
matchLabels:
app: parity
serviceName: parity
template:
metadata:
name: parity
labels:
app: parity
spec:
containers:
- name: parity
image: "etccoop/parity:latest"
imagePullPolicy: Always
args:
- "--chain=classic"
- "--jsonrpc-port=8545"
- "--jsonrpc-interface=0.0.0.0"
- "--jsonrpc-apis=web3,eth,net"
- "--jsonrpc-hosts=all"
ports:
- containerPort: 8545
protocol: TCP
name: rpc-port
- containerPort: 443
protocol: TCP
name: https
readinessProbe:
tcpSocket:
port: 8545
initialDelaySeconds: 650
livenessProbe:
tcpSocket:
port: 8545
initialDelaySeconds: 650
volumeMounts:
- name: parity-config
mountPath: /parity-config
readOnly: true
- name: parity-data
mountPath: /parity-data
volumes:
- name: parity-config
secret:
secretName: parity-config
volumeClaimTemplates:
- metadata:
name: parity-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "classic-ssd"
resources:
requests:
storage: 50Gi
service
apiVersion: v1
kind: Service
metadata:
labels:
app: parity
name: parity
namespace: parity
annotations:
cloud.google.com/app-protocols: '{"my-https-port":"HTTPS","my-http-port":"HTTP"}'
spec:
selector:
app: parity
ports:
- name: default
protocol: TCP
port: 80
targetPort: 80
- name: rpc-endpoint
port: 8545
protocol: TCP
targetPort: 8545
- name: https
port: 443
protocol: TCP
targetPort: 443
type: LoadBalancer
ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress-parity
namespace: parity
annotations:
#nginx.ingress.kubernetes.io/rewrite-target: /
kubernetes.io/ingress.global-static-ip-name: cluster-1
spec:
tls:
secretName: tls-classic
hosts:
- www.redacted.com
rules:
- host: www.redacted.com
http:
paths:
- path: /
backend:
serviceName: web
servicePort: 8080
- path: /rpc
backend:
serviceName: parity
servicePort: 8545
Issue
I've redacted hostnames and such, but this is my basic configuration. I've also run a hello-app container from this documentation here for debugging: https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app
Which is what the endpoint for ingress on /
points to on port 8080 for the hello-app
service. That works fine and isn't the issue, but just mentioned here for clarification.
So, the issue here is that, after creating my cluster with GKE and my ingress LoadBalancer on Google Cloud (the cluster-1
global static ip name in the Ingress file), and then creating the Kubernetes configuration in the files above, the Health-Check fails for the /rpc
endpoint on Google Cloud when I go to Google Compute Engine -> Health Check -> Specific Health-Check for the /rpc
endpoint.
When I edit that Health-Check to not use HTTP Protocol and instead use TCP Protocol, health-checks pass for the /rpc
endpoint and I can curl it just fine after and it returns me the correct response.
The issue is that a few minutes after that, the same Health-Check goes back to HTTP protocol even though I edited it to be TCP, and then the health-checks fail and I get a 502 response when I curl it again.
I am not sure if there's a way to attach the Google Cloud Health Check configuration to my Kubernetes Ingress prior to creating the Ingress in kubernetes. Also not sure why it's being reset, can't tell if it's a bug on Google Cloud or something I'm doing wrong in Kubernetes. If you notice on my statefulset
deployment, I have specified livenessProbe
and readinessProbe
to use TCP to check the port 8545.
The delay of 650 seconds was due to this ticket issue here which was solved by increasing the delay to greater than 600 seconds (to avoid mentioned race conditions): https://github.com/kubernetes/ingress-gce/issues/34
I really am not sure why the Google Cloud health-check is resetting back to HTTP after I've specified it to be TCP. Any help would be appreciated.
I found a solution where I added a new container for health check on my stateful set on /healthz endpoint, and configured the health check of the ingress to check that endpoint on the 8080 port assigned by kubernetes as an HTTP type of health-check, which made it work.
It's not immediately obvious why the reset happens when it's TCP.