This is great to see that you’re getting in with SOFACuda!!
1) to use the maximum performance of the GPU:
– First of all you need a powerful GPU card! What is the one you are using?
– To benefit of a great acceleration, the amount of parallelizable tasks must be very high, i.e. the number of DOFs of your object must be high enough. Otherwise, the interest of parallelization won’t be notice, i.e. the cost of copy between CPU and GPU might be to high compared to the computation itself.
– Finally, the GPU implementation of all components in the scene (Mass, ForceField, Solver ..) must be as optimized as possible.

I have not SofaCUDA compiled on my current laptop, I will have a look when I get back to my office. Does anyone has additional advices/infos ?

2) about cutting on GPU, I am not sure it has already been widely investigated in SOFA. You might not be able to find built-in examples. However, this is possible to implement. The memory sharing between CPU and GPU must be carefully done since topology information changes at runtime.

Never hesitate to ask support, we are happy to assist you in your work!
Best wishes,

There are multiple factors that can impact GPU performance. The raptor-cuda.scn model has about 20000 elements, which in theory would be enough to fill a Titan Xp. However, the occupancy will depend on how many blocks of threads can fit on each SM, which depends on the size of the blocks and on how many registers the kernels use.

The low occupancy would explain why only 25% of your GPU is being used. To know exactly what prevents it from being fully used, you should use the visual profiler that comes with the CUDA SDK. It will tell you exactly what kernels could be improved and how.

The profiler will also tell you the bottleneck that leads to a low frame rate. It would either be the memory transfers or the kernels. The reason is probably that it simply takes a long time to compute the force field. To decrease that time you would need to increase the occupancy of the GPU.

On my computer (with a GTX 1050 Ti), GPU usage reaches about 40%, with about 45 frames per second. When I used CUDA a while ago, the bottleneck was the GPU was not fully used, and the memory transfers had little immpact.