How to use locks in PHP cron jobs to avoid cron overlaps

Cron jobs are hidden building blocks for most of the websites. They are generally used to process/aggregate data in the background. However as a website starts to grow and there is gigabytes of data to be processed by every cron job, chances are that our cron jobs might overlap and possibly corrupt our data. In this blog post, I will demonstrate how can we avoid such overlaps by using simple locking techniques. I will also discuss a few edge cases we need to consider while using locks to avoid overlap.

One of the cron job will die since a cron job with $pid=40856 is already in progress.

Working of cron.helper.php
The helper class create a lock file inside LOCK_DIR. For our test cron job above, lock file name will be job.php.lock. Lock file name suffix can be configured using LOCK_SUFFIX.

If cronHelper::lock() finds that lock file already exists, it extracts the previous cron job process id from the lock file and checks whether a previous cron job is still running. If previous job is still in progress, we abort our current current job. If previous job is not in progress i.e. died abruptly, current cron job acquires the lock.

This is the classic method for avoiding cron overlaps. However there can be various other methods of achieving the same thing. If you know any do let me know through your comments.

However, I like to use cache solution for set lock variable with expired time other than using file.

Nate

I use a database to hold cron locks since my cron jobs are balanced across a number of web nodes. I have a problem where a script or system crash can occur and the lock will never be removed. Does anyone know of a good solution to automatically handle locks across multiple web nodes, in cases where the script crashes and the lock is still present? The processes are run through http through a load balancer on a local network.

ndlinh

Hi Nate,

If you use cache solution to store locking flag and set expire time for it you can solve your problem.

http://abhinavsingh.com Abhinav Singh

Hi Nate,

As ndlinh suggested you can use cache like memcached to have your locks in place.

Since memcached is a distributed solution all your nodes will be able to detect the lock. In case the process dies abruptly, the lock will expire automatically after $ttl. (Though $ttl will act as a tuning parameter here)

Kishore Kumar

Dear Abhinav,

Will this work when two or more cron jobs (schedule jobs) running on the same server? I am using PHP on IIS server on Windows 2008. Please help.

Nate

The cache solution sounds promising, at least better than my current solution. I still have the problem that is, what if the script is actually still running? In some cases running the script twice could bring the database down or corrupt data. I have seen some situations where a script takes an average of 30 minutes to run, but sometimes it takes 90+ minutes due to heavy system load. It can be very unpredictable. In a situation where you could not ever risk the possibility of the script running twice, would the only solution be to check if the script is running through apache? I think I can do that by parsing server-status.

http://abhinavsingh.com Abhinav Singh

Yeah cache solution can serve you better, but remember its only a cache. If cache is refreshed you might end up running the script twice. Check this link http://tinyurl.com/yz48ga9 in case you are using memcached.

But still 90 min or even a 30 min cron job seems like a bad solution to me. In such cases its better to break down the job into several components. Probably by knowing what exactly you are trying to achieve through these cron jobs, I can think of a better solution.

Nate

The scripts in question are creating “preferred lists” of user information based on a fairly intensive database aggregations. The lists are then stored in memcache and accessed by the application from there. I have each “preferred list” job in a separate script. There probably is some redesign that could be issued to optimize things but I am hoping to find a solid php cron solution that is as robust as a simple bash lock file implementation. This always worked so well on a single node.

Solution 1: Alright based on the job description I guess memcache based solution can serve you, though don’t rely on caches for such jobs.

Solution 2: Since your cron jobs are interacting with databases, you can very well use the db itself for cron synchronization. Have a pid column per row, which is populated by the process id and probably hostname of the cron job processing it.

Solution 3: Divide the rows in the databases based on the primary key among different cron jobs (just like consistent hashing algorithms in memcached to know which key goes to which server). So that your cron jobs on each machine know what all rows it should process. Then have a localized locking mechanism per box, just like the code you posted. And everything should work out well.

Hope it helps and let me know how it goes

http://kevin.vanzonneveld.net kvz

You can also open a socket on an unused port. Unix will never allow you to open another one.
Solo by Tim Kay uses this method: http://timkay.com/solo/ so you don’t have to put it in your code.

Thank you for your post. Interesting topic. Might an alternative solution be to use a file locked with flock()?

1. Attempt to obtain an exclusive lock on blank file.
2. If lock fails task is running so exit.
3. If lock succeeds run the tasks and finally unlock the file.

In the event of a fatal script error or script completion PHP will automatically unlock the file.

Pros:
– Fatal script errors handled automatically.
– Simple.

Cons:
– Differences in lock handling between platforms (My windows box will only recognise non-blocking locks when run as CLI).
– No way of knowing if the previous tank crashed – you only know it finished (though in my case I’m writing script start and end times to a DB so runs with no end time could be considered crashed).

Kindly feel free to fork it and contribute your ideas to make this a more generic utility class

Paul

Thanks for sharing this! I been trying to understand why I was getting a “fork: service temporary unavailable” for days! my computer was almost out of memory and I stumbled upon your post and your code. I implemented it and did the job! Now my processes lists is clean and my crons work fine and I also get some free memory that was previously consumed by overlapping crons

http://www.colab-aktiv.com ChrisG

Works as advertised, I especially like the check to see if the job is actually run rather than just relying on the existence of the file. This means that it gets unlocked if the job abends abnormally.

thanks

http://www.microcerdos.com.ar Babblo

A little contribution:

replace

$lock_file = LOCK_DIR.$argv[0].LOCK_SUFFIX;

with

$lock_file = LOCK_DIR.basename($argv[0]).LOCK_SUFFIX;

to avoid problems calling the script with full paths.

Sunil

Hi Abhinav,

This is good, but as I test it from command line then a lock is created but when I test it through cron then lock file in not created in the cronHelper directory. while all the code is continuing to execute. Since it get the pid from the lock().

Please suggest, what should I do??

Pawan

Hi Abhinav,
I am really interested and excited to see your solution for the problem I am facing. I am running ISPCONFiG server on ubuntu 12.04. I have many cronjobs php files created in joomla articles and are run through ISPCONFIG cronjobs which do overlaps.

I am very new to linux system and have no idea where I should put your helper file and path I should give in the php file for including the helper file.
Thanks.

Hi this is ganesh i am having 3 years of experience as a java developer and i am certified. i have knowledge on OOPS concepts in java but dont know indepth. After learning php will be enough to get a good career in IT with good package? and i crossed php training in chennai website where someone please help me to identity the syllabus covers everything or not??

thanks,Ganesh.

ganesh m

Hi this is ganesh i am having 3 years of experience as a java developer and i am certified. i have knowledge on OOPS concepts in java but dont know indepth. After learning php will be enough to get a good career in IT with good package? and i crossed php training in chennai website where someone please help me to identity the syllabus covers everything or not??