I am on GKE using Istio version 1.0.3 . I try to get my express.js with socket.io (and uws engine) backend working with websockets and had this backend running before on a 'non kubernetes server' with websockets without problems.
When I simply enter the external_gke_ip as url I get my backend html page, so http works. But when my client-app makes socketio authentication calls from my client-app I get 503 errors in the browser console:
WebSocket connection to 'ws://external_gke_ip/socket.io/?EIO=3&transport=websocket' failed: Error during WebSocket handshake: Unexpected response code: 503
And when I enter the external_gke_ip as url while socket calls are made I get: no healthy upstream
in the browser. And the pod gives: CrashLoopBackOff
.
I find somewhere: 'in node.js land, socket.io typically does a few non-websocket Handshakes to the Server before eventually upgrading to Websockets. If you don't have sticky-sessions, the upgrade never works.' So maybe I need sticky sessions? Or not... as I just have one replica of my app? It seems to be done by setting sessionAffinity: ClientIP
, but with istio I do not know how to do this and in the GUI I can edit some values of the loadbalancers, but Session affinity shows 'none' and I can not edit it.
Other settings that might be relevant and that I am not sure of (how to set using istio) are:
My manifest config file:
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
selector:
app: myapp
ports:
- port: 8089
targetPort: 8089
protocol: TCP
name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: gcr.io/myproject/firstapp:v1
imagePullPolicy: Always
ports:
- containerPort: 8089
env:
- name: POSTGRES_DB_HOST
value: 127.0.0.1:5432
- name: POSTGRES_DB_USER
valueFrom:
secretKeyRef:
name: mysecret
key: username
- name: POSTGRES_DB_PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password
readinessProbe:
httpGet:
path: /healthz
scheme: HTTP
port: 8089
initialDelaySeconds: 10
timeoutSeconds: 5
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=myproject:europe-west4:osm=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
securityContext:
runAsUser: 2
allowPrivilegeEscalation: false
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: myapp-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "*"
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: myapp
weight: 100
websocketUpgrade: true
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: google-apis
spec:
hosts:
- "*.googleapis.com"
ports:
- number: 443
name: https
protocol: HTTPS
location: MESH_EXTERNAL
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: cloud-sql-instance
spec:
hosts:
- 35.204.XXX.XX # ip of cloudsql database
ports:
- name: tcp
number: 3307
protocol: TCP
location: MESH_EXTERNAL
Various output (while making socket calls, when I stop these the deployment restarts and READY returns to 3/3):
kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-8888 2/3 CrashLoopBackOff 11 1h
$ kubectl describe pod/myapp-8888
gives:
Name: myapp-8888
Namespace: default
Node: gke-standard-cluster-1-default-pool-888888-9vtk/10.164.0.36
Start Time: Sat, 19 Jan 2019 14:33:11 +0100
Labels: app=myapp
pod-template-hash=207157
Annotations:
kubernetes.io/limit-ranger:
LimitRanger plugin set: cpu request for container app; cpu request for container cloudsql-proxy
sidecar.istio.io/status:
{"version":"3c9617ff82c9962a58890e4fa987c69ca62487fda71c23f3a2aad1d7bb46c748","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status: Running
IP: 10.44.0.5
Controlled By: ReplicaSet/myapp-64c59c94dc
Init Containers:
istio-init:
Container ID: docker://a417695f99509707d0f4bfa45d7d491501228031996b603c22aaf398551d1e45
Image: gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0
Image ID: docker-pullable://gcr.io/gke-release/istio/proxy_init@sha256:e30d47d2f269347a973523d0c5d7540dbf7f87d24aca2737ebc09dbe5be53134
Port: <none>
Host Port: <none>
Args:
-p
15001
-u
1337
-m
REDIRECT
-i
*
-x
-b
8089,
-d
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jan 2019 14:33:19 +0100
Finished: Sat, 19 Jan 2019 14:33:19 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts: <none>
Containers:
app:
Container ID: docker://888888888888888888888888
Image: gcr.io/myproject/firstapp:v1
Image ID: docker-pullable://gcr.io/myproject/firstapp@sha256:8888888888888888888888888
Port: 8089/TCP
Host Port: 0/TCP
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jan 2019 14:40:14 +0100
Finished: Sat, 19 Jan 2019 14:40:37 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jan 2019 14:39:28 +0100
Finished: Sat, 19 Jan 2019 14:39:46 +0100
Ready: False
Restart Count: 3
Requests:
cpu: 100m
Readiness: http-get http://:8089/healthz delay=10s timeout=5s period=10s #success=1 #failure=3
Environment:
POSTGRES_DB_HOST: 127.0.0.1:5432
POSTGRES_DB_USER: <set to the key 'username' in secret 'mysecret'> Optional: false
POSTGRES_DB_PASSWORD: <set to the key 'password' in secret 'mysecret'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rclsf (ro)
cloudsql-proxy:
Container ID: docker://788888888888888888888888888
Image: gcr.io/cloudsql-docker/gce-proxy:1.11
Image ID: docker-pullable://gcr.io/cloudsql-docker/gce-proxy@sha256:5c690349ad8041e8b21eaa63cb078cf13188568e0bfac3b5a914da3483079e2b
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
-instances=myproject:europe-west4:osm=tcp:5432
-credential_file=/secrets/cloudsql/credentials.json
State: Running
Started: Sat, 19 Jan 2019 14:33:40 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rclsf (ro)
istio-proxy:
Container ID: docker://f3873d0f69afde23e85d6d6f85b1f
Image: gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0
Image ID: docker-pullable://gcr.io/gke-release/istio/proxyv2@sha256:826ef4469e4f1d4cabd0dc846
Port: <none>
Host Port: <none>
Args:
proxy
sidecar
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
myapp
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:15007
--discoveryRefreshDelay
1s
--zipkinAddress
zipkin.istio-system:9411
--connectTimeout
10s
--statsdUdpAddress
istio-statsd-prom-bridge.istio-system:9125
--proxyAdminPort
15000
--controlPlaneAuthPolicy
NONE
State: Running
Started: Sat, 19 Jan 2019 14:33:54 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Environment:
POD_NAME: myapp-64c59c94dc-8888 (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
ISTIO_META_POD_NAME: myapp-64c59c94dc-8888 (v1:metadata.name)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
default-token-rclsf:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-rclsf
Optional: false
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m31s default-scheduler Successfully assigned myapp-64c59c94dc-tdb9c to gke-standard-cluster-1-default-pool-65b9e650-9vtk
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "istio-envoy"
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "cloudsql-instance-credentials"
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "default-token-rclsf"
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "istio-certs"
Normal Pulling 7m30s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0"
Normal Pulled 7m25s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0"
Normal Created 7m24s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Started 7m23s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulling 7m4s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/cloudsql-docker/gce-proxy:1.11"
Normal Pulled 7m3s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:1.11"
Normal Started 7m2s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulling 7m2s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0"
Normal Created 7m2s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Pulled 6m54s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0"
Normal Created 6m51s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Started 6m48s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulling 111s (x2 over 7m22s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/myproject/firstapp:v3"
Normal Created 110s (x2 over 7m4s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Started 110s (x2 over 7m4s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulled 110s (x2 over 7m7s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/myproject/firstapp:v3"
Warning Unhealthy 99s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Readiness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 85s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Back-off restarting failed container
And:
$ kubectl logs myapp-8888 myapp
> api_server@0.0.0 start /usr/src/app
> node src/
info: Feathers application started on http://localhost:8089
And the database logs (which looks ok, as some 'startup script entries' from app can be retrieved using psql):
$ kubectl logs myapp-8888 cloudsql-proxy
2019/01/19 13:33:40 using credential file for authentication; email=proxy-user@myproject.iam.gserviceaccount.com
2019/01/19 13:33:40 Listening on 127.0.0.1:5432 for myproject:europe-west4:osm
2019/01/19 13:33:40 Ready for new connections
2019/01/19 13:33:54 New connection for "myproject:europe-west4:osm"
2019/01/19 13:33:55 couldn't connect to "myproject:europe-west4:osm": Post https://www.googleapis.com/sql/v1beta4/projects/myproject/instances/osm/createEphemeral?alt=json: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp 74.125.143.95:443: getsockopt: connection refused
2019/01/19 13:39:06 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:06 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:06 Client closed local connection on 127.0.0.1:5432
2019/01/19 13:39:13 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:14 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:14 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:14 New connection for "myproject:europe-west4:osm"
EDIT: Here is the serverside log of the 503 of websocket calls to my app:
{
insertId: "465nu9g3xcn5hf"
jsonPayload: {
apiClaims: ""
apiKey: ""
clientTraceId: ""
connection_security_policy: "unknown"
destinationApp: "myapp"
destinationIp: "10.44.XX.XX"
destinationName: "myapp-888888-88888"
destinationNamespace: "default"
destinationOwner: "kubernetes://apis/extensions/v1beta1/namespaces/default/deployments/myapp"
destinationPrincipal: ""
destinationServiceHost: "myapp.default.svc.cluster.local"
destinationWorkload: "myapp"
httpAuthority: "35.204.XXX.XXX"
instance: "accesslog.logentry.istio-system"
latency: "1.508885ms"
level: "info"
method: "GET"
protocol: "http"
receivedBytes: 787
referer: ""
reporter: "source"
requestId: "bb31d922-8f5d-946b-95c9-83e4c022d955"
requestSize: 0
requestedServerName: ""
responseCode: 503
responseSize: 57
responseTimestamp: "2019-01-18T20:53:03.966513Z"
sentBytes: 164
sourceApp: "istio-ingressgateway"
sourceIp: "10.44.X.X"
sourceName: "istio-ingressgateway-8888888-88888"
sourceNamespace: "istio-system"
sourceOwner: "kubernetes://apis/extensions/v1beta1/namespaces/istio-system/deployments/istio-ingressgateway"
sourcePrincipal: ""
sourceWorkload: "istio-ingressgateway"
url: "/socket.io/?EIO=3&transport=websocket"
userAgent: "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
xForwardedFor: "10.44.X.X"
}
logName: "projects/myproject/logs/stdout"
metadata: {
systemLabels: {
container_image: "gcr.io/gke-release/istio/mixer:1.0.2-gke.0"
container_image_id: "docker-pullable://gcr.io/gke-release/istio/mixer@sha256:888888888888888888888888888888"
name: "mixer"
node_name: "gke-standard-cluster-1-default-pool-88888888888-8887"
provider_instance_id: "888888888888"
provider_resource_type: "gce_instance"
provider_zone: "europe-west4-a"
service_name: [
0: "istio-telemetry"
]
top_level_controller_name: "istio-telemetry"
top_level_controller_type: "Deployment"
}
userLabels: {
app: "telemetry"
istio: "mixer"
istio-mixer-type: "telemetry"
pod-template-hash: "88888888888"
}
}
receiveTimestamp: "2019-01-18T20:53:08.135805255Z"
resource: {
labels: {
cluster_name: "standard-cluster-1"
container_name: "mixer"
location: "europe-west4-a"
namespace_name: "istio-system"
pod_name: "istio-telemetry-8888888-8888888"
project_id: "myproject"
}
type: "k8s_container"
}
severity: "INFO"
timestamp: "2019-01-18T20:53:03.965100Z"
}
In the browser at first it properly seems to switch protocol but then causes a repeated 503 response and subsequent health issues cause a repeating restart. The protocol switch websocket call:
General:
Request URL: ws://localhost:8080/sockjs-node/842/s4888/websocket
Request Method: GET
Status Code: 101 Switching Protocols [GREEN]
Response headers:
Connection: Upgrade
Sec-WebSocket-Accept: NS8888888888888888888
Upgrade: websocket
Request headers:
Accept-Encoding: gzip, deflate, br
Accept-Language: nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control: no-cache
Connection: Upgrade
Cookie: _ga=GA1.1.1118102238.18888888; hblid=nSNQ2mS8888888888888; olfsk=ol8888888888
Host: localhost:8080
Origin: http://localhost:8080
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: b8zkVaXlEySHasCkD4aUiw==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1
Following the above I get multiple of these:
Chrome output regarding websocket call:
general:
Request URL: ws://35.204.210.134/socket.io/?EIO=3&transport=websocket
Request Method: GET
Status Code: 503 Service Unavailable
response headers:
connection: close
content-length: 19
content-type: text/plain
date: Sat, 19 Jan 2019 14:06:39 GMT
server: envoy
request headers:
Accept-Encoding: gzip, deflate
Accept-Language: nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control: no-cache
Connection: Upgrade
Host: 35.204.210.134
Origin: http://localhost:8080
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: VtKS5xKF+GZ4u3uGih2fig==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1
The frames:
Data: (Opcode -1)
Length: 63
Time: 15:06:44.412
Using uws (uWebSockets) as websocket engine causes these errors. When I swap in my backend app this code:
app.configure(socketio({
wsEngine: 'uws',
timeout: 120000,
reconnect: true
}))
for this:
app.configure(socketio())
Everything works as expected.
EDIT: Now it also works with uws. I used alpine docker container which is based on node 10, which does not work with uws. After switching to container based on node 8 it works.