I'm trying to build a simple mongo replica set cluster in kubernetes.
i have a StatefulSet of mongod instances, with
livenessProbe:
initialDelaySeconds: 60
exec:
command:
- mongo
- --eval
- "db.adminCommand('ping')"
readinessProbe:
initialDelaySeconds: 60
exec:
command:
- /usr/bin/mongo --quiet --eval 'rs.status()' | grep ok | cut -d ':' -f 2 | tr -dc '0-9' | awk '{ if($0=="0"){ exit 127 }else{ exit 0 } }'
as you can see, my readinessProbe is checking to see if the mongo replicaSet is working correctly.
however, i get a circular dependency with (and existing) cluster reporting:
"lastHeartbeatMessage" : "Error connecting to mongo-2.mongo:27017 :: caused by :: Could not find address for mongo-2.mongo:27017: SocketException: Host not found (authoritative)",
(where mongo-2 was undergoing a rolling update).
looking further:
$ kubectl run --generator=run-pod/v1 tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
bash-5.0# nslookup mongo-2.mongo
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find mongo-2.mongo: NXDOMAIN
bash-5.0# nslookup mongo-0.mongo
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: mongo-0.mongo.cryoem-logbook-dev.svc.cluster.local
Address: 10.27.137.6
so the question is whether there is a way to get kubernetes to always keep the dns entry for the mongo pods to always be present? it appears that i have a chicken and egg situation where if the entire pod hasn't passed its readiness and liveness checks, then a dns entry is not created, and hence the other mongod instances will not be able to access it.
I believe you are misinterpreting the error.
Could not find address for mongo-2.mongo:27017: SocketException: Host not found (authoritative)"
The pod is created with an IP attached. Then it's registered into DNS:
Pod-0 has the IP 10.0.0.10 and now it's FQDN is Pod-0.servicename.namespace.svc.cluster.local
Pod-1 has the IP 10.0.0.11 and now it's FQDN is Pod-1.servicename.namespace.svc.cluster.local
Pod-2 has the IP 10.0.0.12 and now it's FQDN is Pod-2.servicename.namespace.svc.cluster.local
But DNS is a live service, IPs are dynamically assigned and can't be duplicated. So whenever it receives a request:
"Connect me with Pod-A.servicename.namespace.svc.cluster.local"
It tries to reach the registered IP and if the Pod is offline due to a rolling update, it will think the pod is unavailable and will return "Could not find the address (IP) for Pod-0.servicename" until the pod is online again or until the IP reservation expires and only then the DNS registry will be recycled.
The DNS is not discarting the DNS name registered, it's only answering it's currently offline.
You can either ignore the errors during the rolling or rethink your script and try using the internal js environment as mentioned in the comments for continuous monitoring of the mongo status.
EDIT:
I ended up just putting in a ClusterIP Service for each of the statefulset instances with a selector
for the specific instance:
ie
apiVersion: v1
kind: Service
metadata:
name: mongo-0
spec:
clusterIP: 10.101.41.87
ports:
- port: 27017
protocol: TCP
targetPort: 27017
selector:
role: mongo
statefulset.kubernetes.io/pod-name: mongo-0
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
and repeat for the othe sts
s. the key here is the selector:
statefulset.kubernetes.io/pod-name: mongo-0