I have a special question, as I am only about to simulate a real robotics program (ROS in use) in order to get information about occuring problems with a larger amount of robots in use.
Therefore I need to now if it is possible to run a single V-Rep Simulation on multiple computers (~ clustering), not in order to see the whole simulation afterwards, but in order to get results for each heterogenous mobile robot used.
I also only need to run the simulation in "headless" mode so that there won't be any visual 'simulated' output needed. I just want the simulation to run as fast as possible.

The only goal is a very fast simulation on multiple computers with the following output: "errors occured; 'realtime' needed; goals achieved".

As my online seach for a simulator that fits my needs was not very informative I am now looking forward for any replies.

I am not sure if I understood everything exactly..
You basically want to simulated many robots that are isolated from each other, but still in the same simulation environment?
If they are isolated from each other, then it is similar to run several different simulations in parallel. If they are not isolated from each other, then what kind of interaction do they have? Do they interact on a physical level (e.g. collide with each other)? If this is the case, then it is not directly feasible since the physics engines need to have at hand all the involved robots, contact points, masses, etc.

The Robots are not isolated from each other. There are different kind of robots in the environment. And yes, they need to interact on a physical level with each other (e.g. each Robot needs to be able to notice other Robots and having e.g. a 'physical arm' work on them).
Basically they drive inbetween stations (a station equals a stationary robot that has a certain task to do).

it is not directly feasible since the physics engines need to have at hand all the involved robots, contact points, masses, etc.

Why can't the physics engine keep track of all the involved robots in this case? I think I didn't understand this correctly.

Is the problem that 'clustering' isn't supported by v-rep? I can't find much information about this topic concerning any simulator.

Have N instances of V-REP run in synchronous mode, i.e. each simulation step is triggered from an external application (e.g. remote API client or ROS node)

Each instance has exactly the same content at start-up.

The external application can then load different robots into the different instances and have simulation run.

In that stage, simulations are isolated. But if your external application notices that two robots come close to each other, then you could load one of the robots into the same scene as the other robot and have them interact. When they separate by a specified threshold again, then you could reload one of the robots into its original scene again.

Above is a sub-optimal solution and will make a difference only with more than say 20 robots or so (since the synchronous mode is actually slower than running V-REP in free mode, since it is waiting for a trigger signal).

Additionally, the dynamic content should be quasi-static when robots are isolated, or you would have to report also the initial velocities to all robot links when reloading it into a different V-REP instance. Then you might run into other physics engine-specific issues

The main bottleneck is actually the physics engines: a physics engine needs to solve a given set of constraints (e.g. collisions, joints, etc) that are interlinked in one thread/cores. The physics engine can of course analyze the constraints and divide them into two/more independent groups, that can be solved in 2/more different threads... but only if the two/more robots are physically not interacting with each other.

I am insecure if I am making a mistake, but is it correct that V-Rep always
only uses a single Core of the available ones?

I am currently running a simulation with approximately 12 Pioneer Robots and the simulation is very slow. While checking my core's I noticed that there is only a single one, of the available four, fully in use (60% - 100%), while some others are still untapped. The same behaviour occured as I used a computer with a lot more cores.
In comparison to only having a single Pioneer simulated, the group of Pioneers in the same scene is very slow.

As I need to run the simulation as fast as possible as I explained previously I need to make use of all of the available hardware. If there is a way to run the simulation using all hardware components please tell me.

The main bottleneck is actually the physics engines: a physics engine needs to solve a given set of constraints (e.g. collisions, joints, etc) that are interlinked in one thread/cores. The physics engine can of course analyze the constraints and divide them into two/more independent groups, that can be solved in 2/more different threads... but only if the two/more robots are physically not interacting with each other.

Does this mean, that due to difficulties with the physics engine, there won't be a simulator available that can speed up a simulation containing multiple interacting robots in the same scene?

The V-REP simulation part runs mainly on one core. The is also important since we can't/don't want to mix-up the actuation and sensing phases of a discrete simulation time step. But the Vortex physics engine runs on several cores if the simulation content can be divided. This is rarely the case normally, except when you simulate many particles: particles can easily be separated into independent groups for the physics calculation. Except when all the particles are touching each other: then again, we cannot do the calculations on several cores.

The first thing to do is identify what is the real bottleneck in your case: is it the physics? The vision sensors? the proximity sensors?
You can find calculation timings at the top of the scene, when simulation is running. You can also get an idea by using the model in Models/other/timing info.ttm