My random stuff

Really simple batch job scheduling / workqueue for Linux via xargs

I just discovered a really simple way to create a workqueue on Linux that someone else might find useful. If you have a bunch of jobs to run that all require different parameters and that potentially take different amounts of time to complete, it’s difficult to schedule them in a way that makes maximum use of the available cores short of using some sort of batch scheduling system, which is overly complicated for a lot of prototyping purposes. It turns out that the xargs command has builtin workqueue scheduling that is really easy to use.

basic syntax (assuming you want to run the program ‘command’ with one parameter, and that you want to have four processes running at any one time):

If you have a quad-core processor with hyperthreading, you could do -P8, etc.

You can also obviously store the params in a file and do cat file | xargs … .

The nice thing about this approach over batch scheduling for prototyping is that if you hit Ctrl-C, it kills all the child processes.

I haven’t experimented yet to find the optimal way to generate a separate logfile for each child process, but I also just discovered PPSS which is a more powerful system for achieving the same thing as xargs, and supports separate logfiles: http://code.google.com/p/ppss/

I hope this is as useful to someone else as it is going to be to me!!

UPDATE 2010-06-07: Ole Tange left me a message in response to this post alerting me to the existence of the project he maintains, GNU Parallel. Looks like an awesome tool.