When your backup script is running for too long it sometimes causes the second backup script starting at the time when previous backup is still running. This increasing pressure on the database, makes server slower, could start chain of backup processes and in some cases may break backup integrity.

Simplest solution is to avoid this undesired situation by adding locking to your backup script and prevent script to start second time when it’s already running.

Here is working sample. You will need to replace “sleep 10″ string with actual backup script call:

It works perfectly most of the times. Problem is that you could still theoretically run two scripts at the same time so both will pass lock file checks and will be running together. To avoid that you would need to place unique lock file just before check and make sure no other processes did the same.

Now even if you managed to run two scripts at the same time only one script could actually start backup. In very rare situation both scripts will refuse to start (because of two lock files existing at the same time) but you could catch this issue by simply monitoring script exit code. Anyway – as soon you receive backup exit code different than zero it’s time to review your backup structure and make sure it works as desired.

Please note – when you terminate this script manually you will also need to remove lock file as well so script will pass check on startup. You could also use this script for any periodic tasks you have like Sphinx indexing, merging or index consistency checking.

Uli Stärksays:

I think this is not a good solution, because a touch is not atomic and can lead to errors.

You better use a perl/php/python/… script calling flock LOCK_EX to get an exclusive lock on a file. Its even better to get a mysql lock (GET_LOCK), because you could theoretically run the job from two distinct hosts

“It’s worth pointing out that there is a slight race condition in the above lock example between the time we test for the lockfile and the time we create it. A possible solution to this is to use IO redirection and bash’s noclobber mode, which won’t redirect to an existing file.”

It also shows how to use traps to catch and remove the lock file after the script gets killed/termed/etc, which is important for backup scripts to clean up after themselves if they can

mikesays:

Seems like this could lead to a race condition, you might want to use set -o noclobber or instead use mktemp -d since mkdir is atomic. Another common approach is to ‘kill -0′ the pid to verify the other job did not fail and neglect to clean up the lock file with a trap. (kill -9 is a potential pitfall still with traps)

You all are absolutely right about possible race conditions and drawbacks. File system is an additional, relatively slow layer, locking behavior may vary depends on FS type and may not be atomic or thread safe. So if we’re talking about race condition prevention in parallel execution environment I would consider to use much faster and reliable in-memory mutex inside C/C++/Java/Python/etc code (as mentioned by Ketan and Uli) instead of file-base locking.

At the same time backup scripts and other periodic tasks are mostly started using cron job once in a while and could barely cause race condition on the first place. In this case having unique lock names with attached process id is convenient way to implement external process monitor.