I am quite new to Google Kubernetes Engine, and just Kubernetes in general. I've created a "production-ish" Kubernetes cluster running my old Node.js app that's using socket.io. To help myself to do that I used Google's "Deploying a containerized web application" how-to after which I've set up a Load Balancer with an Ingress which would use a Managed Certificate by somewhat following another guide which is Using Google-managed SSL certificates (the Setting up the managed certificate part). This left me with a cluster using 1 pool with three instance groups, each using 1-2 nodes.
The backend was up and the frontends were able to connect to it correctly. The problem is with WebSockets and frontend getting an error WebSocket connection to 'wss://mycooldomain.com/socket.io/?EIO=3&transport=websocket&sid=afskjaisfhf-afasfoiaofis' failed: Error during WebSocket handshake: Unexpected response code: 400
, which I've been trying to figure out all day.
The latter of the two guides I've been using mentions creating a Node Port and a Managed Certificate and then an Ingress which would link the two together. I have decided to create an Ingress with a different backend config for the load balancer in order to fix the problem:
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: my-cool-backendconfig
namespace: my-cool-namespace
spec:
timeoutSec: 60
connectionDraining:
drainingTimeoutSec: 30
sessionAffinity:
affinityType: "CLIENT_IP"
The reason for creating this is to try different values for timeouts in order to keep the WebSocket connection. I've also tried such values as timeoutSec: 20000
or drainingTimeoutSec: 3000
. sessionAffinity
part also came from many StackOverflow threads and GitHub issues.
So that config had to be applied on my NodePort:
apiVersion: v1
kind: Service
metadata:
namespace: my-cool-namespace
name: my-cool-nodeport
labels:
app: my-cool-app
annotations:
cloud.google.com/backend-config: '{"ports": {"80":"my-cool-backendconfig"}}'
spec:
selector:
app: my-cool-app
type: NodePort
ports:
- protocol: TCP
port: 80
targetPort: 8080
And on my Ingress, if I understood correctly:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
namespace: my-cool-namespace
name: my-cool-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: my-cool-global-ip
networking.gke.io/managed-certificates: my-cool-certificate
cloud.google.com/backend-config: '{"default": "my-cool-backendconfig"}'
spec:
backend:
serviceName: my-cool-nodeport
servicePort: 80
After trying different values for timeouts, I've noticed that the Error during WebSocket handshake: Unexpected response code: 400
error does not necessarily happen on every socket.emit()
and is rather variable, depending on, I guess whether the load balancer has allowed the connection (?).
Even if Google guides mention using larger timeout values, even the most obscene ones (timeoutSec: 20000
as I described above) don't really help establish stable WebSocket connections, because they end up throwing the error occasionally.
Looking at the problem from backend/frontend node apps standpoint, I've only gone as far as changing the socket.io config to try to establish websocket
connection first before polling
:
const server = http.createServer(app);
const io = require('socket.io').listen(server);
io.set('transports', ['websocket', 'polling']);
Which didn't help either.
How do I make it work without throwing the error every now and then?
Bonus question: I've noticed a lot of users with the same/similar problem use Nginx Ingress controllers, is that necessary for proper load balancing at all or is it only for real production environments?
If it's not working only in some of the cases, this might be due to the fact that you're losing your session affinity. For starters, read into session affinity docs. Based on what you're saying, you might be losing it due to the fact that you have multiple nodes. Try to scale down to one node and see what happens. If the same issue persists, check how many replicas are in each node, try reducing replicas to one per node - perhaps you're losing session affinity at replica level. You may probably be able to scale up if you keep one replica per node.