Right now I'm still in the conceptual stage of planning my application but I run into one problem all the time, the application is supposed to be self-hosted by my customers, on their own servers and will include some very long running scripts, depending on how much data every customer enters in his application.

Now I think I have two options, either use cronjobs, like for example let one or multiple cronjobs run at a time that every customer can set himself, OR make the whole processing of data as daemons that run in the background...

My question is, since it's a self-hosted application (and every server is different)... is it even recommended to try to write php that starts background processes on a customers server, or is this more something that you can do reliably only on your own server...?

Or should I use cronjobs for these long running processes?

(depending on the amount of data my customers will enter in the application, a process could run 3+ hours)

Is that even a problem that can be solved, reliably, with PHP...? Excuse me if this should be a weird question, I'm really not experienced with PHP daemons and/or long running cronjobs created by php.

So to recap everything:
Commercial self-hosted application, including long running processes, cronjobs or daemons? And is either or maybe both also a reliable solution for a paid application that you can give to your customers with a clear conscience because you know it will work reliable on all kinds of different servers...?

EDIT*
PS: Sorry, I forgot to mention that the application targets only Linux servers, so everything like Debian, Ubuntu etc etc.

2 Answers
2

Short answer, no, don't go for background process if this will be a client hosted solution. If you go towards the ASP concept (Application Service Provider... not Active Server Pages ;)) then you can do some wacky stuff with background processes and external apps connecting to your sql servers and processing stuff for you.

What i suggest is to create a strong task management backbone and link that to a solid task processing infrastructure. I'll recommend you read an old post i did quite some time ago regarding background processes and a strategy i had adopted to fix long running processes:

JobProcessor is called either by a user when a page triggers or using a cronjob as you wish. JobProcessor::process() is the key that starts the whole processing or continues it. It loads the JobQueues and asks the job queues if there is work to do. If there is work to do, it asks the jobqueue to start/continue it's job.

JobQueue Model: Used to queue several JOBS one behind each other and controls what job is currently current by keep some kind of ID and STATE about which job is running.

Job Model: Represents exactly what needs to be done, it contains for example the name of the controller that will process the data, the function to call to process the data and a serialized configuration property that describe what must be done.

XYZController: Is the one that contains the processing method. When the processing method is called, the controller must load everything it needs to memory and then process each individual unit of work as fast as possible.

Example:

Call of index.php

Index.php creates a jobprocessor controller

Index.php calls the jobprocessor's process()

JobProcessor::Process() loads all the queues and processes them

For each JobQueue::Process(), the job queue loads it's possible Jobs and detects if one is currently running or not. If none is running, it starts the next one by calling Job::Process();

Job::Process() creates the XYZController that will work the task at hand. For example, my old system had an InvoicingController and a MassmailingController that worked hand in hand.

Job::Process() calls XYZController::Prepare() so that it loads it's information to process. (For example, load a batch of emails to process, load a batch of invoices to create)

Job::Process() calls XYZController::RunWorkUnit() so that it processes a single unit of work (For example, create one invoice, send one email)

Job::Process() asks JobProcessingController::DoIStillHaveTimeToProcess() and if so, continues processing the next element.

Job::Process() runs out of time and calls XYZController::Cleanup() so that all resources are released

JobQueue::Process() ends and returns to JobController

JobController::Process() is about to end? Open a socket, call myself back so i can start another round of processing until i don't have anything to do anymore

Handle the request from the user that start in position #1.

Ultimately, you can instead open a socket each time and ask the processor to do something, or you can queue a CronJob to call your processor. This way your users won't get stuck waiting for the 3/4 work units to complete each time.

That seems to be an interesting method, although I'm not quite sure how I would start coding a task management system really :D Do you think something like Beanstalkd using one of their php libraries could be my solution? As I don't want to provide a server for client applications to connect, I want to keep it as simple as possible for now, as I'm not a real PHP pro... yet Question is only if I could have my PHP application have install Beanstalkd automatically when my script installer runs?
–
MarcJan 25 '12 at 14:03

Nice, never heard of Beanstalkd before, but it looks like it's a command line tool. Like i said, Application Service Provider? Do you have your own infrastructure and servers, go for CLI scripts with Beanstalkd. If you ask your users to setup the app on their servers, KISS (keep it stupid simple) it's your job to make the app complex internally and make things works flawlessly and fast, not the user that installs your app
–
Mathieu DumoulinJan 25 '12 at 14:19

sweet! Thanks Mathieu, I have to take a deep look into that, downloading some example classes as we speak, your answer helped me a lot :) Stockoverflow is a great site, never thought you get good answers as that so fast :)
–
MarcJan 25 '12 at 16:15

Its worth noting that, in addition to running daemons or cron jobs, you can kick off long running processes from a web request (but note that it must run outside of the webserver process group) and of course asynchronous message processing (which is essentially a variant on the batch approach).

All four of these approaches are very different in terms of how they behave, how concurrency and timing are managed. The factors which make them all different are the same ones you omitted from your question - so it's not really possible to answer.

Unfortunately all rely on facilities which are very different between MSWindows and POSIX systems - so although PHP will run on both, if you want to sell your app on both platforms it's going to need 2 versions.

Maybe you should talk to your potential customer base and ask them what they want?

The first one is basically what I am doing atm, a form is submitted and kicks off the process, right now I am just using chained fucntions to complete one task after another... So basically I split one huge task in multiple small ones, wrote one function for each and chained them, so that one function calls the other one once it's complete...
–
MarcJan 25 '12 at 14:10

But is that a good and reliable solution for a client hosted app? And the second one is an extension, where I then again have the question as also on the above comment, can I have my application when it runs it's installer, install an extension on the webserver? Or would my clients have to do that manually or their server support?
–
MarcJan 25 '12 at 14:10