While number of particles and frames-per-second gives us a picture of real world performance in an application, it is more useful to compare pure algorithm efficiency regardless of the number of particles. This can be found by multiplying # particles by the frames per second, to get the average number of particles simulated per second, on a given hardware platform.

H.E. (in particles per second) = # Particles * fps

Figure 1. Algorithm performance for various SPH fluid simulators, measured using Hardware Efficiency metric (pps), relative to the number of particles. Orange curve is the current Fluids v.3 simulator, on a GeForce GTX460M. Blue line is NVIDIA’s PhysX, measured from Spaete’s Fluid Sandbox. Green and Purple lines are recent academic results by Pajarola (2010), and Fang Chao (2010).

Figure 1 shows the hardware-based algorithm efficiency for various SPH fluid simulators. These results were calculated by running each SPH simulator on a GeForce GTX460M, disabling any advanced rendering, and measuring pure simulation efficiency as a frame rate for a given number of particles. The hardware efficiency, H.A.E is computed from this.

# Particles

msec / frame

Hardware Efficiency (particles per second)

Fluids v.3 (GPU)

4,096

0.68

6,113,432

8,192

1.30

6,301,538

16,384

2.30

7,123,478

32,767

4.20

7,801,666

65,536

8.80

7,447,272

131,072

18.21

7,197,803

262,144

42.30

6,197,257

524,288

98.00

5,349,877

1,048,576

234.00

4,481,094

2,900,800

1085.00

2,673,548

8,388,608

4433.00

1,892,309

Fluid Sandbox, NVIDIA PhysX (GPU)

26,000

41.67

624,000

74,140

60.75

1,220,344

102,440

68.92

1,486,404

200,000

104.17

1,920,000

Fang Chao (OpenCL)

16,384

7.69

2,129,920

65,536

28.57

2,293,760

Pajarola, 2010

16,128

8.13

1,983,744

75,200

38.46

1,955,200

129,024

58.82

2,193,408

255,600

100.00

2,556,000

RealFlow 2012 **

3,987

20 secs

22,327

27,865

203 secs

13,726

158,778

755 secs

7,781

1,200,000

8 hours

8,571

2,700,000

14 hours, 53 min

13,303

** RealFlow 2012 uses an adaptive time step, and hybrid fluid-grid methods for increased realism. Thus comparisons should be taken lightly. Measurements based on simulation experiments reported on youtube.

II. Algorithm Efficiency

Hardware efficiency, above, is independent of number of particles, but still depends on the capabilities of the GPU. A better measure would report pure algorithm efficiency normalized for different hardware. This can be accomplished by dividing by the peak GFlop rating of the GPU.

A.E. = particles per second per Gflop

Measuring pure Algorithm Efficiency requires us to run the SPH simulation on a number of different hardware devices. At present, I have measure Fluids v.3. on a Tesla GeForce GTX460M (92 cores), and a Kepler GeForce GTX670 (1334 cores).

Initial results are as follows:

# Particles

H.E. (pps)

Hardware

A.E. (pps per Gflop)

1,048,576

4,481,094

GTX 460M, 192 core, 518.4 Gflops

8644

1,048,576

12,000,000

GTX 670, 1334 core, 2460 Gflops

4878

More tests are needed (see Development page). What this shows is that simulation efficiency varies both with number of particles, and with the underlying hardware. While a jump in hardware from 518 Gflops to 2460 Gflops should result in a 4.75x increase, the actual increase is only 2.67x. The reasons for this are subtle. More measurements, for different number of particles, and different hardware, should provide a clearer picture. Overall, Fluids v.3 achieves 4,400,00 pps on a GTX460M, running 4 million particles at 1/4 fps, and 12,000,000 pps efficiency on a GeForce GTX 670, allowing simulations of 4 million particles at 4 fps.