Performances issues while restoring mongodump. What can I do better?

7/17/2019

My company is in the process of migrating data from one kubernetes cluster to another one.

Part of the migration is to move data from mongoDB.

The mongoDB installations came with some backup scripts, which I used as an entrypoint for my custom restore.

What I successfully did (at least as far as I can tell right now) is to run a mongodump on the old cluster and pipe it into a mongorestore in the new cluster.

It works, but it is really really slow. The dataset (/data/db) is around 65G big. The restore has been running for the last 6 hours or so and is barely moving forward.

Also, at some point the process was interrupted and instead of deleting all data I simply started to script again - thinking it would still apply everything and throw errors for duplicate keys which I can ignore.

This is what I precisely do

kubectl --kubeconfig=old-cluster.conf exec -t $SOURCE_MONGO_POD -- \
  bash -c "mongodump --host $SOURCE_MONGO_REPLICASET \
  --username $SOURCE_USERNAME --password $SOURCE_PASSWORD \
  --authenticationDatabase admin --gzip --archive --oplog" |
kubectl exec -i $TARGET_MONGO_POD -- \
  bash -c "mongorestore --host $TARGET_MONGO_REPLICASET \
  --username $TARGET_USERNAME --password $TARGET_PASSWORD \
  --authenticationDatabase admin --gzip --archive --oplogReplay"

What is wrong with my approach. Why is my performance so bad?

Someone was suggesting to just copy over the /data/db folder, which might be faster and since I need a 1:1 migration, would be sufficient.

-- Moritz Schmitz v. Hülst
kubernetes
mongodb
mongodump
mongorestore
stdin

1 Answer

7/18/2019

As we can read at Back Up and Restore with MongoDB Tools:

The mongodump and mongorestore utilities work with BSON data dumps, and are useful for creating backups of small deployments. For resilient and non-disruptive backups, use a file system or block-level disk snapshot function, such as the methods described in the MongoDB Backup Methods document.

Because mongodump and mongorestore operate by interacting with a running mongod instance, they can impact the performance of your running database. Not only do the tools create traffic for a running database instance, they also force the database to read all data through memory. When MongoDB reads infrequently used data, it can evict more frequently accessed data, causing a deterioration in performance for the database’s regular workload.

You should consider using alternatives such as Filesystem Snapshots or MongoDB Cloud Manager.

Also have you considered adding a Replicaiton to your current MongoDB?

-- Crou
Source: StackOverflow