Backup MongoDB to AWS S3

April 2020

Backing up to S3

I don't think I need to tell you how important it is to back up your production database. Having said this, a replica set is NOT what I mean when I say "backing up". A replica set is always synced, so if something bad happens to your data, the bad things are synced to the replica. What you need is a scheduled backup for the worst case scenario. And what could be better suited to store a database dump than S3? So here we're going to learn how to backup a MongoDB to S3, but you could basically use this knowledge to backup any database (MySQL, Postgres, ...).

This article aims at people that are hosting their database themselves. People that rent servers, install MongoDB on those servers and maintain the setup themselves. There are also managed MongoDB setups, like for example Atlas, that would take care of those concerns for you.

The backup script

So let's get started with the final script and then let's disassemble it.

Install the aws-cli. You could just run sudo apt-get install aws to do so. Then run aws configure to grant the scripts access to AWS. Configure with the Access Key and Secret you obtained for the user created previously.

You can now check if everything is working correctly by running your script: ./backup-s3.sh. (after you've ran chmod +x ./backup-s3.sh)

The cron job

Now there is one missing piece to the puzzle. Your script needs to be scheduled! Here, a cronjob comes in handy. You can set up a new cronjob by running crontab -e. Then you can insert the following script:

The first line is important, otherwise the cronjob doesn't have access to your aws cli. This is the script for an hourly backup. You can adapt it to your needs.

Well, that's it! Now you have backups, and you'll also be informed about the status of the backups. Of course, the status update once an hour might get a bit annoying. You can change the backup script, such that it just sends the mails once a day:

if [[ "$backup_name" == *"0600" ]]; then
# send the mail
fi

This would just send the logs generated at 6 am.

Final Notes

You should run this script on the replica server and not the primary database server. This takes the load off of the critical server. See for example this Stackoverflow discussion.