1 Answer
1

Rather than using Berkeley DB, why not just use something like
Parallel::Fork::BossWorker? I've been happily using it for several years to do what you're describing.

Update

Nothing wrong with Berkeley DB per se, but it strikes me that you'd need to be writing a bunch of queue management code whereas a module like BossWorker takes care off all that for you (and allows you to concentrate on the real problem).

As an example, I use it to monitor network switches where checking them serially takes too long (especially if one or more switches are having issues) and checking them in parallel buries the monitoring box. The stripped down version looks like:

Hi, 2 things: 1. what is wrong with Berkeley DB? 2. Do you have an example on how to use this? Thank you.
–
IgorDec 16 '11 at 19:41

@Igor -- Nothing wrong with Berkeley DB as such it's just that the work queue/worker management problem has already been solved rather nicely by others.
–
gsiemsDec 17 '11 at 3:01

Hi, I think we are talking about 2 different scenarios here. I have a script, "a.pl" which can be run with 2 options. 1. If I run it as "./a -w 100" it means go to the web pick up the data, store the data on the disk and process first 100 items of data, where the data is a list of strings. 2. If I run it as "./a -f 100" it will mean to open the file from the scenario one read the data from the record 101 (as first 100 already been processed and finish. 3. I open another terminal and run the same script as "./a -f 100" at the same time. Is this OK with scenario? Thank you.
–
IgorDec 17 '11 at 5:57

It's possible that we are talking two different scenarios here-- but if you're planning on running the processes at the same time then I don't see the difference. Your first process (./a -w 100) populates the queue and grabs the first 100 to process, while the second and third also grab a group of 100 to process, simultaneously. How is that different than having the parent grab the data and split it up between 3 children to do the processing? If you want you can define the queue as an array of arrays (Nx100) so that each child processes 100 records at a time (until all the data is processed).
–
gsiemsDec 17 '11 at 17:53

Correct. Now the question is how to properly split the data. I think I just need to read the documentation on this module... Thank you.
–
IgorDec 17 '11 at 19:46