Say I have a 4-core workstation, what would Linux (Ubuntu) do if I execute

mpirun -np 9 XXX

Will 9 run immediately together, or they will run 4 after 4?

I suppose that using 9 is not good, because the remainder 1 will make the computer confused, (I don't know is it going to be confused at all, or the "head" of the computer will decide which core among the 4 cores will be used?) Or it will be randomly picked. Who decide which one core to call?

If I feel my cpu is not bad and my ram is okay and large enough, and my case is not very big. Is it a good idea in order to fully use my cpu and ram, that I do mpirun -np 8 XXX, or even mpirun -np 12 XXX.

2 Answers
2

The load will be distributed by your OS to be worked on as many cores as there are available. The time might not be proportional to the number of threads. Here is a silly example why. Assume you have one job that you want to do three times, and it takes the same amount of time every time (1 unit of time). You have two cores. Assume there is nothing else running.

Case one: you only have one thread. In this case, the thread runs on one core, and the whole thing takes 3 units of time to complete. Total time: 3

Case two: You have two threads. In one unit of time, the job gets done twice (once per core). You then have to wait a whole unit of time for the third iteration to complete. Total time: 2

Case 3: You have 3 threads. Your OS will try and make everything fair, and so will split up the time evenly between the three processes. By then end of unit 1, NONE of them will be completed. By unit 2 they will all be done. (see case above). Total time: 2

Starting more threads will not really hurt your performance much (the cost of starting a thread is less than 1MB) but it might not help either.

The only way to know what would be faster to do is it test it, but use the following rules as a guide: Use at least the same number of threads as you have cores. Additionally, if process has lots and lots of memory access all over the place it may actually be faster to have more threads than cores (memory access is very slow compared to executing other instructions, and the OS will fill the time with real execution of something else that does not have to wait).

From limited testing and my (not very deep, though having used it a few times) understanding of parallel computation:

They will run at the same time.

The load will be distributed among cores. The computer won't be "confused", but you will get very little - or no - performance boost (because each core is handling more than one task). At worst, it can slow down things.

At most you want to run a number equivalent to your number of cores; larger values (within reason; if you give an absurdly large number you might run out of RAM or the system becomes very slow) will run but you might not have any benefits, or even might see slowdown. It does not hurt trying, though.