I am using Microk8s and bitnami helm chart here
I set up a replicaset of 3 replicas mongo-0 (by definition this is Primary), mongo-1 and mongo-2
Bitnami makes the replicaset to always use mongo-0 (if available) as Primary replica. However the next can happen: I find out I need to update the nodes, let's say to increase storage. To do this, I would need to: 1) Drain the node running mongo-0. This automatically triggers a new election, and let's say mongo-1 is the new primary. 2) I will add to the cluster a new node (with more capacity).
This will make the mongodb replicaset to assign a mongo-0 pod to the new node. However, the new node is empty, so the persistent volume where I store the database (lets say /mnt/mongo) is empty.
I would expect that the current primary replica will finish populating the database to the new replica (mongo-0, and therefore its Persistent Volume) and ONLY when that is done, then make mongo-0 the primary.
However I saw that mongo-0 becomse primary without any data being copied to it from the previous primary, effectively deleting the whole database, since now the primary node states that the database is empty.
How is that possible? What am I missing here?
I am not familiar with your exact management tools but the scaling process you described is wrong. You should not be removing 1 out of 3 nodes out of the replica set at any point, at least not in a production environment.
To replace a RS node:
Expecting generic software to automatically figure out when #3 completes and move on to #4 correctly is, I would say, rather optimistic. Maybe MongoDB ops manager would do that.
Your post contains a number of other incorrect statements about how MongoDB operates. For example, a node that has no data cannot become a primary in a replica set with data. Perhaps you have other configuration issues going on and you actually have independent nodes/multiple replica sets in what you think is a single deployment.