Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme and experimental validation

A Navier-Stokes solver based on Cartesian structured finite volume
discretization with embedded bodies is presented. Fluid structure
interaction with solid bodies is performed with an explicit
partitioned strategy. The Navier-Stokes equations are solved in the
whole domain via a Semi-Implicit Method for Pressure Linked Equations
(SIMPLE) using a colocated finite volume scheme, stabilized
via the Rhie-Chow discretization. As uniform Cartesian grids are used,
the solid interface usually do not coincide with the mesh, and then a
second order Immersed Boundary Method is proposed, in order to avoid
the loss of precision due to the staircase representation of the
surface. This fact also affects the computation of fluid forces on the
solid wall and, accordingly, the results in the fluid-structure
analysis. In the present work, first and second order approximations
for computing the fluid forces at the interface are studied and
compared. The solver is specially oriented to General Purpose Graphic
Processing Units (GPGPU) hardware and the efficiency is
discussed. Moreover, a novel submerged buoy experiment is also
reported and serves to validate the presented fluid-structure
algorithm. The experiment consists of a sphere with positive buoyancy
fully submerged in a cubic tank, subject to harmonic displacements
imposed by a shake table. The sphere is attached to the bottom of
the tank with a string. Position of the buoy is determined from video
records with a Motion Capture algorithm. The obtained amplitude and
phase curves allow a precise determination of the added mass and drag
forces. Due to this aspect the experimental data can be of interest
for the validation of fluid-structure interaction codes. Finally, the
numerical results are compared with the experiments, and allows the
validation of the numerically predicted drag and added mass of the
body.

4.
Advances in NS solver on GPU por M.Storti et.al.
GPUs, MIC, and multi-core processors
• Computing power is limited by the number
of transistors we can put in a chip. This is
limited in turn to the largest silicon chip
you can grow (O(400 mm2
)), and the
minimum size of the transistors you can
make (O(22 nm)). So actually the limit is in
the order of 5 × 109
transistors per chip.
• So with that number of transistors you can
do one of the following
Have a few (6 to 12) very powerful
cores: Xeon E5-26XX processors.
Have 60 or more less powerful cores,
each one with 40 million transistors,
similar to a Pentium processor: Intel
Many Integrated Core (MIC), Xeon Phi
Have a very large (O(3000)) simple
(without control) cores: Nvidia GPUs.
Die shot of the Nvidia ”Kepler1”
GK104 GPU
CIMEC-CONICET-UNL 4
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

5.
Advances in NS solver on GPU por M.Storti et.al.
Solution of incompressible Navier-Stokes ﬂows on GPU
• GPU’s are less efﬁcient for algorithms that require access to the card’s
(device) global memory. In older cards shared memory is much faster but
usually scarce (16K per thread block in the Tesla C1060) .
• The best algorithms are those that make computations for one cell
requiring only information on that cell and their neighbors. These
algorithms are classiﬁed as cellular automata (CA).
• In newer cards there is a much larger memory cache. Anyway data
locality is fundamental.
• Lattice-Boltzmann and explicit F M (FDM/FVM/FEM) fall in this category.
• Structured meshes require less data to exchange between cells (e.g.
neighbor indices are computed, no stored), and so, they require less
shared memory. Also, very fast solvers like FFT-based (Fast Fourier
Transform) or Geometric Multigrid are available .
CIMEC-CONICET-UNL 5
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

7.
Advances in NS solver on GPU por M.Storti et.al.
Solution of the Poisson with FFT
• Solution of the Poisson equation is, for large meshes, the more CPU
consuming time stage in Fractional-Step like Navier-Stokes solvers.
• We have to solve a linear system Ax = b
• The Discrete Fourier Transform (DFT) is an orthogonal transformation
ˆx = Ox = ﬀt(x).
• The inverse transformation O−1
= OT
is the inverse Fourier Transform
x = OT ˆx = iﬀt(ˆx).
• If the operator matrix A is spatially invariant (i.e. the stencil is the same at
all grid points) and the b.c.’s are periodic, then it can be shown that O
diagonalizes A, i.e. OAO−1
= D.
• So in the transformed basis the system of equations is diagonal
(OAO−1
) (Ox) = (Ob),
Dˆx = ˆb,
(1)
• For N = 2p
the Fast Fourier Transform (FFT) is an algorithm that
computes the DFT (and its inverse) in O(N log(N)) operations.
CIMEC-CONICET-UNL 7
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

10.
Advances in NS solver on GPU por M.Storti et.al.
Upper bound on computing rate for the NS solver w/FFT
• For instance for a mesh of N = 2563
= 16Mcell the rate is 4.66 Gcell/s
(SP).
• So that one could perform one Poisson solution (two FFTs) in 4.2ms
(doesn’t take into account the computation of the RHS, which is
negligible).
• Poisson eq. is a large component of the computing time in the Fractional
Step Solver for Navier-Stokes (and the only one that is > O(N)) so that
this puts an upper limit on the computing rate achievable with our
algorithm (based on FFT).
CIMEC-CONICET-UNL 10
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

11.
Advances in NS solver on GPU por M.Storti et.al.
Solution of NS on embedded geometries: FVM/IBM1
• When solid bodies are
included in the ﬂuid, we
have to deal with curved
surfaces that in general do
not ﬁt with node planes.
• The ﬁrst approach is to ﬁt
the body surface by a
stair-case one (a.k.a. Lego
surface).
• Convergence is degraded
to 1st order.
• Stair-case acts like an
artiﬁcial roughness so it
tends to give higher drag.
solid body
fluid
CIMEC-CONICET-UNL 11
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

12.
Advances in NS solver on GPU por M.Storti et.al.
Solving Poisson with embedded bodies
• FFT solver and GMG are very fast but have several restrictions: invariance
of translation, periodic boundary conditions.
• No holes (solid bodies) in the ﬂuid domain.
• So they are not well suited for direct use on embedded geometries.
• One approach for the solution is the IOP (Iterated Orthogonal Projection)
algorithm.
• It is based on solving iteratively the Poisson eq. on the whole domain
(ﬂuid+solid). Solving in the whole domain is fast, because algorithms like
Geometric Multigrid or FFT can be used. Also, they are very efﬁcient
running on GPU’s .
CIMEC-CONICET-UNL 12
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

15.
Advances in NS solver on GPU por M.Storti et.al.
Accelerated Global Preconditioning (AGP) algorithm
• Storti et.al. ”A FFT preconditioning technique for the solution of
incompressible ﬂow on GPUs.” Computers & Fluids 74 (2013): 44-57.
• It shares some features with IOP scheme.
• Both solvers are based on the fact that an efﬁcient preconditioning exists.
The preco. consists in solving the Poisson problem on the global domain
(ﬂuid+solid, i.e. no “holes” in the ﬂuid domain).
• IOP is a stationary method and its limit rate of convergence doesn’t
depend on reﬁnement.
• However id does depend on (O(AR)) where AR is the maximum aspect
ratio of bodies inside the ﬂuid. I.e. the method is bad when very slender
bodies are present.
• Our proposed method AGP is a preconditioned Krylov space method.
• The condition number of the preconditioned system doesn’t depend on
reﬁnement but deteriorates with the AR of immersed bodies.
• IOP iterates over both the velocity and pressure ﬁelds, whereas AGP
iterates only on the pressure vector (which is better for implementation on
GPU’s).
CIMEC-CONICET-UNL 15
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

16.
Advances in NS solver on GPU por M.Storti et.al.
MOC+BFECC
• The Method of Characteristics (MOC) is a Lagrangian method. The
position of the node (“particle”) is tracked backwards in time and the
values at the previous time step are interpolated into that position. These
projections introduce dissipation.
• BFECC Back and Forth Error Compensation and Correction can ﬁx this.
• Assume we have a low order (dissipative) operator (may be MOC, full
upwind, or any other) Φt+∆t
= L(Φt
, u).
• BFECC allows to eliminate the dissipation error.
Advance forward the state Φt+∆t,∗
= L(Φt
, u).
Advance backwards the state Φt,∗
= L(Φt+∆t,∗
, −u).
CIMEC-CONICET-UNL 16
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

19.
Advances in NS solver on GPU por M.Storti et.al.
Analysis of performance
• Regarding the performance results shown above, it can be seen that the
computing rate of QUICK is at most 4x faster than that of BFECC. So
BFECC is more efﬁcient than QUICK whenever used with CFL > 2, being
the critical CFL for QUICK 0.5. The CFL used in our simulations is typically
CFL ≈ 5 and, thus, at this CFL the BFECC version runs 2.5 times faster
than the QUICK version.
• Effective computing rate is more representative than the plain computing
rate
effective computing rate = CFLcrit ·
(number of cells)
(elapsed time)
• The speedup of MOC+BFECC versus QUICK increases with the number of
Poisson iterations. In the limit of very large number of iters (very low
tolerance in the tolerance for Poisson) we expect a speedup 10x (equal to
the CFL ratio).
CIMEC-CONICET-UNL 19
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

20.
Advances in NS solver on GPU por M.Storti et.al.
Validation. A buoyant fully submerged sphere
• A ﬂuid structure interaction problem has been
used for validation.
• Fluid is water ρﬂuid = 1000 [kg/m3
].
• The body is a smooth sphere D =10cm, fully
submerged in a box of L =39cm. Solid
density is ρbdy = 366 [kg/m3
] ≈ 0.4 ρﬂuid
so the body has positive buoyancy. The buoy
(sphere) is tied by a string to an anchor point
at the center of the bottom side of the box.
• The box is subject to horizontal harmonic
oscillations
xbox = Abox sin(ωt),
CIMEC-CONICET-UNL 20
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

21.
Advances in NS solver on GPU por M.Storti et.al.
Spring-mass-damper oscillator
• The acceleration of the box creates a driving force on the sphere. Note
that as the sphere is positively buoyant the driving force is in phase with
the displacement. That means that if the box accelerates to the right, then
the sphere tends to move in the same direction. (launch video p2-f09)
• The buoyancy of the sphere produces a restoring force that tends to keep
aligned the string with the vertical direction.
• The mass of the sphere plus the added mass of the liquid gives inertia.
• The drag of the ﬂuid gives a damping effect.
• So the system can be approximated as a (nonlinear) spring-mass-damper
oscillator
inertia
mtot ¨x +
damp
0.5ρﬂCdA|v| ˙x +
spring
(mﬂotg/L)x =
driving frc
mﬂotabox,
mtot = mbdy + madd, mﬂot = md − mbdy,
madd = Cmaddmd, Cmadd ≈ 0.5, md = ρﬂVbdy
CIMEC-CONICET-UNL 21
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

25.
Advances in NS solver on GPU por M.Storti et.al.
Computational FSI model
• The ﬂuid structure interaction is performed with a partitioned Fluid
Structure Interaction (FSI) technique.
• Fluid is solved in step [tn
, tn+1
= tn
+ ∆t] between the known position
xn
of the body at tn
and an extrapolated (predicted) position xn+1,P
at
tn+1
.
• Then ﬂuid forces (pressure and viscous tractions) are computed and the
rigid body equations are solved to compute the true position of the solid
xn+1
.
• The scheme is second order precision. It can be put as a ﬁxed point
iteration and iterated to convergence (enhancing stability).
CIMEC-CONICET-UNL 25
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

26.
Advances in NS solver on GPU por M.Storti et.al.
Computational FSI model (cont.)
• The simulation is performed in a non-inertial frame attached to the box.
So, according to the Galilean equivalence principle we have just to add an
inertial force due to the acceleration of the box.
• Due to the incompressibility this is absorbed in an horizontal ﬂotation
force on the sphere, which is the driving force for the system.
• The rigid body eqs. are now:
mbdy ¨x + (mﬂotg/L)x = Fﬂuid + mﬂotabox
where Fﬂuid = Fdrag + Fadd are now computed with the CFD FVM/IBM1
code.
CIMEC-CONICET-UNL 26
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))

27.
Advances in NS solver on GPU por M.Storti et.al.
Why use a positive buoyant body?
• Gov. equations for the body:
mbdy ¨x + (mﬂotg/L)x =
comp w/CFD
Fﬂuid({x}) +mﬂotabox
where Fﬂuid = Fdrag + Fadd are ﬂuid forces computed with the FVM/IBM
code.
• The most important and harder from the point of view of the numerical
method is to compute the the ﬂuid forces (drag and added mass).
• Buoyancy is almost trivial and can be easily well captured of factored out
from the simulation.
• Then, if the body is too heavy it’s not affected by the ﬂuid forces, so we
prefer a body as lighter as possible.
• In the present example mbdy = 0.192[kg], md = 0.523[kg],
madd = 0.288[kg]
CIMEC-CONICET-UNL 27
((version texstuff-1.2.8-4-g7cfd85f Mon Apr 27 23:01:25 2015 -0300) (date Thu Apr 30 12:04:10 2015 -0300))