We announce that the SpeedIT Toolkit 0.9 has been released. The library has been internally deployed, tested and validated in a real scenario (blood flow in aortic bifurcation, data came from IT’IS Foundation, Switzerland).

We finished the work and the official release is there. You can find the OpenFOAM plugin for GPU-based iterative solvers (Conjugate Gradient and BiCGSTAB) at speedit.vratis.com. Classic version of our library and the OpenFOAM plugin are both based on GPL. Enjoy!

I'm trying to use the speedit toolkit and downloaded the free classic version.
I followed the README files and recompiled OpenFOAM in single precision.

The icoFoam cavity tutorial runs with the PCG_accel solver, however it is about ten times slower than the normal PCG solver. Both are run in single precision with diagonal preconditioner. Below are the final iterations of both runs.Time = 0.5

Here I have to say that the normal single precision simpleFoam does not even work for this tutorial. With PCG_accel the tutorial can be run, however with the error message. I'm therefore not sure if this error message is resulting from PCG_accel. Here the single pression with PCG_accel is about 4 times slower than the normal double precision PCG (179.52 s).

Can you explain why the accelerated solver is slower than the normal solver?

Before we can comment performance results you obtained, we should know your hardware configuration.
Please remember, that even the most powerful GPUs are only about ten times faster than modern CPUs.

Next, in your example accelerated solver converges after 0 or 1 iteration. In this case most of the time in solver routine is spent on data transfer between CPU and GPU, not on computations on GPU side. We described this phenomena thoroughly in documentation - on one of the fastest GPUs we obtained small performance increase, when one solver iteration was done. Performance gain was significantly larger, when dozens of solver iterations were required.

The pitzDaily example shows, that both solvers (OpenFOAM and SPeedIT) does not converge in required number of iterations.
However, it seems that our solver could converge in larger number of iterations.
I can not comment performance comparison, because OpenFOAM DOUBLE precision solver converges in much less number of iterations than our SINGLE precision solver. I think that comparison with our double precision solver should be done.

I thought indeed that it was overhead in the first case. Unfortunately the combination of PCG/PCG_accel and diagonal/none preconditioning doesn't converge properly for the testcases I'm interested in (airfoil computations at the moment). So a good comparison on that part is not possible. As preconditioner for PCG I use GAMG or DIC, but I prefer GAMG as a solver actually. How is the progress in making GAMG run on the GPU?
For potentialFoam on an airfoil, I also see the error:ERROR : solver function returned -1

I ran the cylinder tutorial of potentialFoam with PCG/PCG_accel using diagonal preconditioning. There it worked, although the accelerated version was slower. But I think it as to do with my hardware.
I'm running on a Quadro FX 1700 card with 512 MB memory. The clock speed is only 400Mhz for the memory and 460 Mhz for the GPU. Due to our preinstalled system, I could not run with the latest driver and CUDA version. Currently I use driver 195.36.15 with CUDA 3.0. I didn't expect a huge speedup here, but perhaps a little bit. Do you expect any speed-up for such a configuration?

Why is the initial residual for every pressure loop starting at 0, while in the normal solver it starts from a lower level? It doesn't seem to affect the results much, but the number of iterations increase since it starts at a higher level.

I downloaded the folder 1.2.Classic from sourceforge.net. As you told yourself the readMe file ( and so Installation instruction in it) seems to be out of date. I tried to Install it as it was mentioned in readMe file but it was unsuccessful. would you please send me a note or link me installation steps.

Dear Alex,
I'm also trying the SpeedIt solver, in my case on interDyMFoam (damBreakwithObstacle case). I've got the same problems you experienced.
The accelerated solver (PCG_accel) is much slower than the normal one (PCG) (Maybe a hardware problem due to my quite old graphics card!) and - what is more important - the computation stops after a few iterations as it is not converging! The normal solver runs fine.
Have you found the reason for this and any solution?

Best regards

Andreas

Quote:

Originally Posted by aloeven

Hi Lucasz,

I'm trying to use the speedit toolkit and downloaded the free classic version.
I followed the README files and recompiled OpenFOAM in single precision.

The icoFoam cavity tutorial runs with the PCG_accel solver, however it is about ten times slower than the normal PCG solver. Both are run in single precision with diagonal preconditioner. Below are the final iterations of both runs.Time = 0.5

Here I have to say that the normal single precision simpleFoam does not even work for this tutorial. With PCG_accel the tutorial can be run, however with the error message. I'm therefore not sure if this error message is resulting from PCG_accel. Here the single pression with PCG_accel is about 4 times slower than the normal double precision PCG (179.52 s).

Can you explain why the accelerated solver is slower than the normal solver?

In my case the old graphics cards is clearly the bottle neck. The communication from and to the card is too slow. I'm running on a Quadro FX 1700 card with 512 MB memory. The clock speed is only 400Mhz for the memory and 460 Mhz for the GPU.

One a newer card the performance should be better. With the free version it is however difficult to compare results. You have to run everything in single precision and you cannot use good preconditioners.

Dear Alex,
thank you for your quick reply! I don't bother too much with the speed. The main problem is the convergence of the results. If a use the same preconditioner (diagonal) for both calculations (pcg and pcg_accel), pcg converges while pcg_accel diverges. Do you know why?