I'm trying to run Mathematica parallel on multiple nodes on a HPC, and I use the remote kernel tab in the "Parallel Kernel Configuration" settings to set the communication between different nodes(detailed setup here). Since every time I would get different nodes of the HPC, each time I have to manually change the setting in the "Parallel Kernel Configuration". So is it possible to change the settings within the code?

Here is the detail where I want to change:

For example, how can I enable 16 kernels on a remote host named "mike123" and 10 kernels on a remote host named "mike234"

@Murta unfortunately not, but I'm wondering can one use Lightweight Grid on a HPC machine? Does it compatible with the PBS system?
–
xslittlegrassSep 2 '13 at 3:08

i don't know, but I have tested in normal PCs, and you just install LWG in the remote machines that you want to use the kernels, and the machines just appear in the LWG Menu of your picture if they are in the same network. Very simple.
–
MurtaSep 2 '13 at 3:16

@xslittlegrass I disagree with your (and others') votes to close. It's true that the solution to your problem is the code in the other question (not even the answer), but that doesn't make it a duplicate. In fact, that question is entirely about the slowdowns and someone redirected to that from here would be thoroughly confused. I don't recall a question on changing the kernel configuration, so I would suggest that you answer your own question with an explanation of how to do that using LaunchKernel.
–
The Toad♦Sep 7 '13 at 18:05

1 Answer
1

It turns out that we can actually launch subKernels in multiple nodes by using the LaunchKernels function. The solution is presented by fpghost in the question here about the slow down of the subKernels. In order not to confuse the reader by redirecting to that question, I will just explain how to use the code in that question to solve my problem here, as suggested by rm -rf. (This solution only apply to PBS system.)

Firstly, we should know how to launch subKernels locally and remotely. Basically, we can use LaunchKernels[] to launch subkernels both on the local system and on remote system. The default will launch kernels on the local system,

LaunchKernels[]

will launch subkernels on my local system, in my case it will launch 16 subKernels.

will launch 16 subKernels on remote system remote, for the detailed explanation of the arguments, see here.

Secondly, we should know which nodes have been assigned to our job. This information can be easily get from PBS environmental variables:

PBS_NODEFILE : File containing list of allocated nodes
PBS_O_WORKDIR: Directory from where job is submitted
PBS_O_QUEUE: Queue job was submitted to
PBS_JOBID: Job ID number
PBS_JOBNAME: The name of the job.
PBS_NP: Number of processes requested
PBS_NUM_PPN: Number of processors per node requested

Put everything together, here is the function that can launch all the subKernels on the nodes assigned to our job.

If I have qsub two nodes and got node name fu01 and fu02 each has 24 cores. The shell is on fu01. Then as I understand, I can just type LaunchKernels[RemoteMachine[fu02, "ssh -x -f -l 31 math -mathlink -linkmode Connect 4 -linkname '2' -subkernel -noinit", 16]], right? But it didn't work. It just gives ` $Failed`
–
matheoremNov 12 '13 at 9:03

@matheorem I think fu02 should be "fu02", since the argument should be a string. Could you try that?
–
xslittlegrassNov 12 '13 at 15:34

Yeah! string works! Thank you very Much! To notice before Launch, Needs["SubKernelsRemoteKernels"] is essential. But why I can't find "SubKernelsRemoteKernels" in mathematica doc?
–
matheoremNov 13 '13 at 1:54

@matheorem glad it works. I find they are in the document here.
–
xslittlegrassNov 13 '13 at 17:52

Mathematica is a registered trademark of Wolfram Research, Inc. While the mark is used herein with the limited permission of Wolfram Research, Stack Exchange and this site disclaim all affiliation therewith.