It seems that Option2 is preferred, because it is more intuitive for end-user to create rankfile for mpi job, which is described by -app cmd line.

All hosts definitions used inside -app <file>, will be treated like a single global hostlist combined from all hosts appearing inside "-app file" and rankfile will refer to any host, appearing inside "-app <file>" directive. is that correct?

regards

Mike

P.S. man mpirun claims, that:

-app <appfile> Provide an appfile, ignoring all other command line options.

but it seems that it does not ignore all other command line options.
Even more, it seems that it is very comfortable to specify some per_job parameters inside mpirun command just before -app appfile and putting per_host params inside "-app appfile". What do you think?

Hmmm...well actually, there isn't a bug in the code. This is an interesting question!

Here is the problem. It has to do with how -host is processed. Remember, in the new scheme (as of 1.3.0), in the absence of any other info (e.g., an RM allocation or hostfile), we cycle across -all- the -host specifications to create a global pool of allocated nodes. Hence, you got the following:

When we start mapping, we call the base function to get the available nodes for this particular app_context. The function starts with the entire allocation. It then checks for a hostfile, which in this case it won't find.

Subsequently, it looks at the -host spec and removes -all- nodes in the list that were not included in -host. In the case of app_context=0, the "-host witch1" causes us to remove dellix7 and witch2 from the list - leaving only witch1.

This list is passed back to the rank_file mapper. The rf mapper then looks at your rankfile, which tells it to put rank=0 on the +n1 node on the list.

But the list only has ONE node on the list, which would correspond to +n0! Hence the error message.

We have two potential solutions I can see:

Option 1. we can leave things as they are, and you adjust your rankfile to:

rank 0=+n0 slot=0

rank 1=+n0 slot=0

Since you specified -host witch2 for the second app_context, this will work to put rank0 on witch1 and rank1 on witch2. However, I admit that it looks a little weird.

Alternatively, you could adjust your appfile to:

-np 1 -host witch1,witch2 ./hello_world

-np 1 ./hello_world

Note you could have -host witch1,witch2 on the second line too, if you wanted. Now your current rankfile would put rank0 on witch2 and rank1 on witch1.

Option 2. we could modify your relative node syntax to be based on the eventual total allocation. In this case, we would not use the base function to give us a list, but instead would construct it from the allocated node pool. Your current rankfile would give you what you wanted since we wouldn't count the HNP's node in the pool as it wasn't included in the allocation.

Any thoughts on how you'd like to do this? I can make it work either way, but have no personal preference.

Ralph

On Jul 15, 2009, at 7:38 AM, Ralph Castain wrote:

Okay, I'll dig into it - must be a bug in my code.

Sorry for the problem! Thanks for patience in tracking it down...Ralph

1. each line of the appfile causes us to create a new app_context. We store the provided -host info in that object.

2. when we create the "allocation", we cycle through -all- the app_contexts and add -all- of their -host info into the list of allocated nodes

3. when we get_target_nodes, we start with the entire list of allocated nodes, and then use -host for that app_context to filter down to the hosts allowed for that specific app_context

So you should have to only provide -np 1 and 1 host on each line. My guess is that the rankfile mapper isn't correctly behaving for multiple app_contexts.

Add --display-allocation to your mpirun cmd line for the "not working" cse and let's see what mpirun thinks the total allocation is - I'll bet that both nodes show up, which would tell us that my "guess" is correct. Then I'll know what needs to be fixed.

Took a deeper look into this, and I think that your first guess was correct.

When we changed hostfile and -host to be per-app-context options, it became necessary for you to put that info in the appfile itself. So try adding it there. What you would need in your appfile is the following:

You can try oversubscribe node. At least by 1 task.
If you hostfile and rank file limit you at N procs, you can ask mpirun for N+1 and it wil be not rejected.Although in reality there will be N tasks.So, if your hostfile limit is 4, then "mpirun -np 4" and "mpirun -np 5" both works, but in both cases there are only 4 tasks. It isn't crucial, because there is nor real oversubscription, but there is still some bug which can affect something in future.

--Anton Starikov.

On May 12, 2009, at 1:45 AM, Ralph Castain wrote:

This is fixed as of r21208.

Thanks for reporting it!Ralph

On May 11, 2009, at 12:51 PM, Anton Starikov wrote:

Although removing this check solves problem of having more slots in rankfile than necessary, there is another problem.

If I remember correctly, I used an array to map ranks, and since the length of array is NP, maximum index must be less than np, so if you have the number of rank > NP, you have no place to put it inside array.

"Likewise, if you have more procs than the rankfile specifies, we map the additional procs either byslot (default) or bynode (if you specify that option). So the rankfile doesn't need to contain an entry for every proc." - Correct point.

Lenny.

On 5/5/09, Ralph Castain <rhc@open-mpi.org> wrote: Sorry Lenny, but that isn't correct. The rankfile mapper doesn't care if the rankfile contains additional info - it only maps up to the number of processes, and ignores anything beyond that number. So there is no need to remove the additional info.

Likewise, if you have more procs than the rankfile specifies, we map the additional procs either byslot (default) or bynode (if you specify that option). So the rankfile doesn't need to contain an entry for every proc.

Just don't want to confuse folks.Ralph

On Tue, May 5, 2009 at 5:59 AM, Lenny Verkhovsky <lenny.verkhovsky@gmail.com> wrote:
Hi,maximum rank number must be less then np.if np=1 then there is only rank 0 in the system, so rank 1 is invalid.please remove "rank 1=node2 slot=*" from the rankfileBest regards,Lenny.

> > Error, invalid rank (1) in the rankfile (rankfile.0)
> >> --------------------------------------------------------------------------> > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in file> > rmaps_rank_file.c at line 404
> > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in file> > base/rmaps_base_map_job.c at line 87> > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in file> > base/plm_base_launch_support.c at line 77
> > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in file> > plm_rsh_module.c at line 985> >> --------------------------------------------------------------------------
> > A daemon (pid unknown) died unexpectedly on signal 1 while > attempting to> > launch so we are aborting.> >> > There may be more information reported by the environment (see
> above).> >> > This may be because the daemon was unable to find all the needed > shared> > libraries on the remote node. You may set your LD_LIBRARY_PATH to> have the> > location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes. > >> --------------------------------------------------------------------------> >> --------------------------------------------------------------------------
> > orterun noticed that the job aborted, but has no info as to the > process> > that caused that situation.> >> --------------------------------------------------------------------------
> > orterun: clean termination accomplished