I deployed a MongoDB ReplicaSet as StatefulSet in Kubernetes. I'm running a Bare Metal K8S Cluster and therefore I'm using MetalLB to expose Services of type LoadBalancer. In case of my MongoDB-RS Setup exposed Services look like this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mongo-0 LoadBalancer 10.43.199.127 172.16.24.151 27017:31118/TCP 55m
mongo-1 LoadBalancer 10.43.180.131 172.16.24.152 27017:31809/TCP 55m
mongo-2 LoadBalancer 10.43.156.124 172.16.24.153 27017:30312/TCP 55m
This works as expected but the Problem comes when connecting to the RS from external client:
➜ ~ mongo "mongodb://172.16.24.151:27017,172.16.24.152:27017,172.16.24.153:27017/?replicaSet=rs0"
MongoDB shell version v4.0.10
connecting to: mongodb://172.16.24.151:27017,172.16.24.152:27017,172.16.24.153:27017/?gssapiServiceName=mongodb&replicaSet=rs0
2019-07-05T10:47:27.058+0200 I NETWORK [js] Starting new replica set monitor for rs0/172.16.24.151:27017,172.16.24.152:27017,172.16.24.153:27017
2019-07-05T10:47:27.106+0200 I NETWORK [js] Successfully connected to 172.16.24.153:27017 (1 connections now open to 172.16.24.153:27017 with a 5 second timeout)
2019-07-05T10:47:27.106+0200 I NETWORK [ReplicaSetMonitor-TaskExecutor] Successfully connected to 172.16.24.151:27017 (1 connections now open to 172.16.24.151:27017 with a 5 second timeout)
2019-07-05T10:47:27.136+0200 I NETWORK [ReplicaSetMonitor-TaskExecutor] changing hosts to rs0/10.42.2.155:27017,10.42.3.147:27017,10.42.4.108:27017 from rs0/172.16.24.151:27017,172.16.24.152:27017,172.16.24.153:27017
2019-07-05T10:47:52.654+0200 W NETWORK [js] Unable to reach primary for set rs0
2019-07-05T10:47:52.654+0200 I NETWORK [js] Cannot reach any nodes for set rs0. Please check network connectivity and the status of the set. This has happened for 1 checks in a row.
2019-07-05T10:47:52.654+0200 E QUERY [js] Error: connect failed to replica set rs0/172.16.24.151:27017,172.16.24.152:27017,172.16.24.153:27017 :
connect@src/mongo/shell/mongo.js:344:17
At some point it says "changing hosts to rs0/10.42.2.155:27017,10.42.3.147:27017,10.42.4.108:27017". Since those IPs are Cluster-internal the connection will then fail after this point.
Any suggestions what I could do?
You are using a LoadBalancer that is not stiky session on it, you can use an nginx controller to provide loadbalancer based on nginx with the ability to set sticky session affinity based on client or cookie.
https://github.com/kubernetes/ingress-nginx
Another option is to use a different container as a mongo proxy, for example you can use HAPoxy container where you config HAProxy to listen to TCP connections on 27017 port and as backends the 3 different mongo services, remain that you need to setup a healtheck in HAProxy to allow the process of knowing how many of the backens are alive.