Coordinating data transfer in model with parallel runs

To try to make better use of system resources, I transformed into a parallel structure some sequentially structured code that runs many independent optimisations. Unfortunately, this doesn’t run quite as I had expected, and I would be very grateful for any help that can be offered. The general approach I have used is described in the white paper ‘Multiple models and parallel solving with Mosel’, specifically the example ‘2.6.3: Job queue for managing parallel submodels’.

The problem seems to be of synchronisation. The master model transfers a series of arrays using the shared memory mmjobs module shmem to the submodel before optimisation and the submodel transfers results arrays to the master model after optimisation. Previously, when run in sequential mode, the master model was held before attempting to access the returned files at a ‘wait’ line while the submodel completed its operations. The use of a simple ‘wait’ function wouldn’t seem to be possible in the parallel structure, due to the risk of crossed wires between parallel runs. Instead, I have tried the ‘waitfor’ function in both the main section of the master model and its procedure. This, I hoped, would allow a procedure-specific signal to be sent from the submodel to indicate that the relevant results are ready. However, I get an error message as follows:

My assessment of this is that the ‘waitfor’ function is probably not waiting for the message of the relevant class (see code given below), but instead progressing at the receipt of any message. It looks to me that the procedure for the second parallel run is here advancing through the ‘waitfor’ stage on the receipt of a message from the first parallel submodel.

I have tried to strip the code to its essentials and present it below. I think I have included the relevant parts, but I apologise if I have missed out anything important. The code runs without a problem when not returning files from the submodel to the master model, which uses in the main section of the master code a ‘wait’ function instead of a ‘waitfor’ function. It also successfully returns data when file transfer is with mempipe, but the code is very much slower and the jobs finish in order; this contrasts with running the model with no return array transfer, when parallel submodels complete with a range of times, and thus complete not in the order in which they were started.

Do you have any ideas on how I might go about resolving this problem? One way might be to add another master–submodel layer, by making into a separate model what is currently the procedure, but this might be needlessly complicated. Thanks for reading.

-----

Master model

-----

while (JobsRun.size<JobSize) do

waitfor(0) ! Previously, this was simply ‘wait’

Msg:=getnextevent

if getclass(Msg)=EVENT_END then

retrieve_id:=getfromid(Msg)

JobsRun+={jobid(retrieve_id)}

if JobList<>[] then start_next_job(retrieve_id);end-if

end-if

end-do

-----

Procedure in master model

-----

procedure start_next_job(m:integer)

jobid(getid(array_subModel(array_id_MODEL(m)))):=getfirst(JobList)

cuthead(JobList,1)

initialisations to "bin:shmem:tooperational"+m

A

B

C

end-initialisations

run(array_subModel(array_id_MODEL(m)),“identifier=”+m)

waitfor(m+1) ! I avoid using class 1, which denotes EVENT_END

initialisations from "bin:shmem:fromoperational"+m

X

Y

Z

end-initialisations

-----

Submodel run from procedure

-----

initialisations from "bin:shmem:tooperational"+m

A

B

C

end-initialisations

minimise(some function)

initialisations to "bin:shmem:fromoperational"+m

X

Y

Z

end-initialisations

send(m+1,0) ! The format is as follows: class, value

-----

mempipe alternative: submodel; this would have corresponding changes in the submodel, as detailed above

In the master model, you should simply keep "wait" or use "wait(EVENT_END)" (the waitfor(0) means that you are waiting for events of class 0 [not supported, event identifiers must be positive numbers], but you actually want "EVENT_END" which has value 1). Furthermore, you need to remove the "wait" from the subroutine that starts the submodel: you already have a "wait" for submodel events in the main loop and this is also the place where you need to retrieve the information written out by the submodel (no changes to your submodel code):

In the master model, you should simply keep "wait" or use "wait(EVENT_END)" (the waitfor(0) means that you are waiting for events of class 0 [not supported, event identifiers must be positive numbers], but you actually want "EVENT_END" which has value 1). Furthermore, you need to remove the "wait" from the subroutine that starts the submodel: you already have a "wait" for submodel events in the main loop and this is also the place where you need to retrieve the information written out by the submodel (no changes to your submodel code):