Aliasing errors arise in the multiplication of partial sums, such as those
encountered when numerically solving the Navier–Stokes equations, and can be
detrimental to the accuracy of a numerical solution. In this work, a
performance and cost analysis is proposed for widely used dealiasing schemes in
large-eddy simulation, focusing on a neutrally stratified, pressure-driven
atmospheric boundary-layer flow. Specifically, the exact 3∕2 rule, the
Fourier truncation method, and a high-order Fourier smoothing method are
intercompared.

Tests are performed within a newly developed mixed pseudo-spectral finite differences large-eddy simulation code, parallelized
using a two-dimensional pencil decomposition. A series of simulations are
performed at varying resolution, and key flow statistics are intercompared
among the considered runs and dealiasing schemes.

The three dealiasing methods compare well in terms of first- and second-order
statistics for the considered cases, with modest local departures that decrease
as the grid stencil is reduced. Computed velocity spectra using the 3∕2 rule and
the FS method are in good agreement, whereas the FT method yields a spurious energy
redistribution across wavenumbers that compromises both the energy-containing and
inertial sublayer trends. The main advantage of the FS and FT methods when compared to
the 3∕2 rule is a notable reduction in computational cost, with larger savings
as the resolution is increased (15 % for a resolution of 1283, up to a theoretical 30 % for a resolution of
20483).

The past decades have seen significant progress in computer hardware in
remarkable agreement with Moore's law, which states that the number of nodes
in the discretization grids doubles every 18 months (Moore, 1965; Takahashi, 2005; Voller and Porté-Agel, 2002). A comparable progress has been made in software
development, with the rise of new branches in numerical analysis like reduced-order modeling (Burkardt et al., 2006) and uncertainty quantification
(Najm, 2009), as well as the development of highly efficient numerical
algorithms and computing frameworks like isogeometric analysis
(Hsu et al., 2011) or GPU computing (Bernaschi et al., 2010; Hamada et al., 2009)

The Fourier-based pseudo-spectral collocation method
(Canuto et al., 2006; Orszag, 1970; Orszag and Pao, 1975) remains the preferred
“workhorse” in simulations of wall-bounded flows over horizontally periodic
regular domains. This is often used in conjunction with a centered
finite-difference or Chebychev polynomial expansions in the vertical
direction (Albertson, 1996; Moeng and Sullivan, 2015). The main strength of such an
approach is the high order of accuracy of the Fourier partial sum
representation, coupled with the intrinsic efficiency of the
fast Fourier transform algorithm (Cooley and Tukey, 1965; Frigo and Johnson, 2005). In such
algorithms, the leading-order error term is represented by the aliasing that
arises when evaluating the quadratic nonlinear term (convective fluxes of
momentum). This was first discovered in the early works of Orszag and
Patterson (Orszag, 1971; Patterson, 1971), which also set a cornerstone in
the treatment and removal of aliasing errors in pseudo-spectral collocation
methods. Aliasing errors can severely deteriorate the quality of the
solution, as exemplified by the large body of literature that has dealt with
the topic. In Horiuti (1987) and Moin and Kim (1982), it was shown how the
energy-conserving rotational form of the large-eddy simulation (LES)
equations performed poorly without dealiasing and that the proper removal of
such error significantly improved the accuracy of the solution in statistics
like the flow turbulent shear stress, turbulence intensities, and two-point
correlations. As shown in Moser et al. (1983), Zang (1991), and Kravchenko and Moin (1997), aliasing
errors do not alter the energy conservation properties of the rotational form
of the LES equations, but the additional dissipation that is introduced makes
the flow prone to laminarization. Dealiasing is hence required in order to
accurately resolve turbulent flows with a well-developed inertial subrange,
such as atmospheric boundary layer (ABL) flows, for instance. However, the classic (exact) dealiasing
methods developed in Patterson (1971) based on padding and truncation
(the 3∕2 rule) or on the phase-shift technique have proven to be
computationally expensive and are one of the most costly modules for momentum
integration in high-resolution simulations, as it will be shown later in this
work. For example, in simulations with Cartesian discretization, where N is
the number of collocation nodes in each of the three coordinate directions,
the 3∕2 rule requires us to expand the number of nodes to 3/2×N, and the
phase-shift method needs grids with 2×N nodes. As a result, the
computational burden introduced by these methods is high, mainly due to the
nonlinear increase in the cost of the fast Fourier transform algorithm (such
as the one implemented in the Fastest Fourier Transform in the West (FFTW) library). Additionally, this cost rises
more rapidly when the Fourier transform is performed in higher dimensions.
Therefore, the treatment of aliasing errors severely limits the computational
performances of large-scale models based on high-order schemes.

This has motivated efforts towards the development of approximate yet
computationally efficient dealiasing schemes, such as the Fourier truncation
(FT) method (Moeng, 1984; Moeng and Wyngaard, 1988; Orszag, 1971), the Fourier smoothing
(FS) method (Hou and Li, 2007), and the more recent implicit dealiasing of
Bowman and Roberts (2011). Details on the FT and the FS techniques are
provided in the following section. Limits and merits of the different
dealiasing techniques have been extensively discussed in the past decades
within the turbulent flow framework (Moser et al., 1983; Zang, 1991).

In this work, we provide a cost–benefit analysis and a comparison of
turbulent flow statistics for the FT and FS dealiasing schemes in comparison
to the exact 3∕2 rule using a set of LES of fully developed ABL-type flows.
Simulations and benchmark analysis are performed using a recently developed
mixed pseudo-spectral finite difference code, parallelized via a pencil
decomposition technique based on the 2DECOMP&FFT library (Li and Laizet, 2010).
Results of this work are of prime interest to the environmental fluids
community (e.g., ABL community) because they can help improve the numerical
performance of some of the numerical approaches used. An overview of the
different dealiasing methods is provided in Sect. 2.
Section 3 briefly presents the LES platform with important
benchmark results. The computational cost analysis and flow statistics
obtained with the different dealiasing schemes are later discussed in Sect. 4.
Finally, the conclusions are presented in Sect. 6.

Aliasing errors result from representing the product of two or more variables
by N wavenumbers, when each one of the variables is itself represented by
a finite sum of N terms (Canuto et al., 2006); here, N is assumed even.
Such is the case for example when treating the nonlinear advection term in
the Navier–Stokes (NS) equations. Let f and g be two smooth functions
with the corresponding discrete Fourier transforms expressed as

(1)f(x)=∑k=-N/2N/2-1f^keikxandg(x)=∑m=-N/2N/2-1g^keikx,

with f^k and g^k being the amplitudes of the kth Fourier
mode of f and g. The product of the two functions is hence given
by

Note that the corresponding expression for the Fourier transform of the
product h (result of the convolution of f with g) requires 2N modes.
Therefore, the exact computation of the product represents a major numerical
cost. Traditionally the convolution of the two functions f and g is made
with only N Fourier modes,

(4)h(x)=∑k=-N/2N/2-1h̃keikx.

As a result, the energy contained within the remaining N+1 to 2N modes
folds back on the first N modes, and the amplitude of the first N modes
(h̃k) is aliased. This can be related to the exact amplitude h^k as

h^k=∑m+n=kf^ng^m+∑m+n=k±Nf^ng^m=h̃k(5)+∑m+n=k±Nf^ng^m,

with

(6)h̃k=∑m+n=kf^ng^mand-N/2≤k≤N/2-1,

such that the second term contains the aliasing errors on the kth mode.
Aliasing errors propagate in the solution of the differential equation and
can induce large errors. For the pseudo-spectral methods, the truncation and
aliasing errors affect both the accuracy and the stability of the numerical
solution (see Canuto et al., 2006, Sect. 3, and
Canuto et al., 2007, Sect. 3, for detailed discussion). Traditionally, the aliasing errors are
treated using one of the two methods discussed below.

The 3∕2-rule method is based on the so-called padding and truncation
technique, where the Fourier partial sums are zero-padded in Fourier space by
half the available modes (from N to 3∕2N), inverse-transformed to
physical space before multiplication, multiplied, and then truncated back to
the original variable size (N). This method fully removes aliasing errors.
However, the high computational cost related to the inverse transform
operation discourages its use in large-scale simulations. The fast Fourier
transform (FFT) algorithm has an operational complexity of Nlog⁡2(N);
counting the number of FFT and multiplications, the operation count of the
3∕2 rule applied to dealias the product of two vectors of N components
becomes (45/4)Nlog⁡2(3N/2)(Canuto et al., 2006). An alternative method
is the so-called phase-shift method, which consists of shifting the grid of
one of the variables in physical space. Given the appropriate shift, the
aliasing errors are eliminated naturally in the evaluation of the convolution
sum. This method has a cost equal to 15Nlog⁡2(N)(Canuto et al., 2006),
which is even greater than the 3∕2 rule (Orszag, 1972; Patterson, 1971). The
discussion above concerns one-dimensional problems, but the expansion to
higher-dimensional problems is straightforward
(Canuto et al., 2006; Iovieno et al., 2001). Although this method provides the full
dealiasing of the nonlinear term, the cost of expanding the number of
Fourier modes by a factor of 3∕2 is a computationally expensive endeavor,
especially with the progressively increasing size of numerical grids. To reduce
the numerical burden, two additional methods were proposed in the past for
treating the aliasing errors: the FS method (Hou and Li, 2007), and the FT
method (Moeng, 1984; Moeng and Wyngaard, 1988; Orszag, 1971). In both methods, a set of
high-wavenumber Fourier amplitudes are multiplied by a test function u^k*=f(k)uk to avoid expansions to larger number of modes. As its name
indicates, the FT method sets to zero the last third of the Fourier modes
(f(k)=0, for k>2N/3), equivalent to a sharp spectral cutoff filter. The FS method consists of a progressive attenuation of the
higher frequencies using the weighting function f(k)=e-36k36(Hou and Li, 2007). Figure 1 presents the spectral
function f(k) for the two different methods. Note that both the FT and FS
methods behave like a low-pass filter. Although the FT method (continuous
line) sets to zero any coefficient larger than k/kN>2/3, the FS method
(dashed line) keeps all the wavelengths unperturbed up until k/kN>3/4 and then progressively damps the amplitude of the higher-wavenumber terms.
The advantage of both of these methods is that they avoid the need for
padding the Fourier partial sums and hence reduce the numerical cost.
Specifically, the computational cost of these methods is (15∕2)Nlog⁡2(N)(Canuto et al., 2006), resulting in methods 33% less computationally
expensive than the 3/2 rule. The drawback of such approximate approaches is, however, the fact that a filtering operation is applied to the advection term,
resulting in a loss of information. A desirable property of the FS technique
when compared to the FT method is that the former exhibits a more localized
error and is dynamically very stable (Hou and Li, 2007), while the latter tends
to generate oscillations on the whole domain.

Figure 1Weighting functions used in the FT method (dashed line)
and the FS method (continuous line). The FT method filters scales
with k/kN>2/3 and the FS method at k/kN>3/4.

3.1 Equations and boundary conditions

The LES approach consists of solving the filtered NS equations, where the
time and space evolution of the turbulent eddies larger than the grid size
are fully resolved, and the effect of the smaller ones is parameterized.
Mathematically, this is described through the use of a numerical filter that
separates the larger, energy-containing eddies from the smaller ones. Often,
the numerical grid of size Δ is implicitly used as a top-hat filter,
hence reducing the computational cost (see
Moeng and Sullivan, 2015, for an overview of the technique in ABL research). As a
result, the velocity field can be separated in a resolved component (ũi,
where i=1,2,3) and an unresolved contribution (ui′)
(Smagorinsky, 1963). For this technique to be successful, the low-pass
filter operation must be performed at a scale smaller than the energy
production range, deep in the inertial subrange according to Kolmogorov's
hypothesis (Kolmogorov, 1968; Piomelli, 1999). In atmospheric
boundary-layer flow simulations, this requirement is known to hold in the bulk
of the flow, where contributions from the sub-grid-scale (SGS) motions (or sub-filter-scale motions) to the overall dissipation rate are modest. In
the near-surface regions such a requirement is not met, as the characteristic
scale of the energy production range ℒ scales with the distance
from the wall (ℒ≈κz, where κ≈0.4 is
the Von Kármán constant and z is the wall-normal distance from the
wall); hence, this remains an active research field
(Hultmark et al., 2013; Lu and Porté-Agel, 2014; Meneveau et al., 1996; Porté-Agel et al., 2000; Sullivan et al., 1994). In
this work, the rotational form of the filtered NS equations are integrated,
ensuring conservation of mass of the inertial terms (Kravchenko and Moin, 1997).
The corresponding dimensional form of the equation reads

(7)∂iũi=0,(8)∂tũi+ũj∂jũi-∂iũj=-∂ip*̃-∂jτijΔ,d+f̃i.

In these equations, ũi are the velocity components in the three
coordinate directions x,y,z (streamwise, spanwise, and vertical, respectively), p̃* denotes the perturbed modified pressure field
defined as p̃*=p̃+13ρ0τkkΔ+12ũjũj, where the first term is the kinematic
pressure, the second term is the trace of the sub-grid stress tensor, and the
last term is an extra term coming from the rotational form of the momentum
equation. Here, f̃i represents a generic volumetric force. The flow
is driven by a constant pressure gradient in the streamwise direction imposed
through the body force f̃i. The sub-grid stress tensor is defined
as τijΔ=uiuj̃-ũiũj,
where the deviatoric components are written using an eddy-viscosity approach

(9)τijΔ,d=τijΔ-13τkkΔδij=-2νTS̃ij,

with νT=(CSΔ)2|S̃| being the so-called eddy viscosity,
S̃ij=12(∂jũi+∂iũj) the resolved strain rate tensor, and CS the
Smagorinsky coefficient, a dimensionless proportionality constant
(Lilly, 1967; Smagorinsky, 1963). Many studies have investigated the
accuracy of this type of model, showing good behavior for free-shear flows
(Lesieur and Metais, 1996). For boundary-layer flows, the Smagorinsky constant model
is over-dissipative close to the wall, since the integral length scale scales
with the distance to the wall. Therefore, to properly capture the dynamics
close to the surface, the Mason–Thompson damping wall function is used
(Mason and Thomson, 1992). This function is given by f(z)=Con(κz)-n+Δ-n-1n and is used to decrease the
value of CS close to the wall, reducing the sub-grid dissipation.

Note that the molecular viscous term has been neglected in the governing
equations, including the wall-layer parameterization, which is equivalent to
assuming that the surface drag is mostly caused by pressure (i.e., there are
negligible viscous contributions). This is a typical situation in flow over
natural surfaces where the surface is often in a fully rough aerodynamic
regime.

The drag from the underlying surface is entirely modeled in this application
through the equilibrium logarithmic law for rough surfaces
(Kármán, 1931; Prandtl, 1932), with

(10)τW=κlog⁡(Δz/2z0)2(〈ũ1〉2(Δz/2)+〈ũ2〉2(Δz/2)).

In Eq. (10), 〈ũi〉 is the planar
averaged velocity sampled at Δz∕2, z0 is the aerodynamic
roughness length, representative of the underlying surface,
Δz denotes the vertical grid stencil, and κ=0.4 is
the von Kármán constant. The wall shear stress is computed
considering the norm of the horizontal velocity, and it is projected
over the horizontal directions using the unit vector ni,
such that τW,i=τWni, where

3.2 Numerical implementation and time integration scheme

The equations are solved using a pseudo-spectral approach, where the
horizontal derivatives are computed using discrete Fourier transforms and the
vertical derivatives are computed using second-order accurate centered finite
differences on a staggered grid. A projection fractional-step method is used
for time integration following Chorin's method (Chorin, 1967, 1968).
The governing equations become decoupled, and the system of partial
differential equations can be solved in two steps: first, the nonlinear
advection–diffusion equation is explicitly advanced, and subsequently the
Poisson equation is integrated (the so-called pressure correction step). The
latter equation is obtained by taking the divergence of the first equation
and setting the divergence of velocity at the next time step equal to zero,
to ensure a divergence-free flow field. The algorithm is detailed in the rest
of the section. Initially, the code computes the velocity tensor
G̃ijt=∂juĩt, which contains all the
derivatives of the flow field required to compute the SGS stress tensor
τijΔ,d,t=-2(CSΔ)2|S̃t|S̃ijt. In
the first step of the projection method, the NS equations are solved without
the pressure. Hence, the intermediary right-hand side is computed as

(12)RHS̃i*=uj̃t∂juĩt-∂iuj̃t-∂jτijΔ,d,t+f̃i.

Next, an intermediary step is computed using an Adams–Bashforth scheme,
following

(13)ũi*=uit+Δt32RHS̃i*-12RHS̃it-Δt,

where RHS̃it-Δt is the right-hand side of the previous
step. At this point, the resulting flow field is not divergence-free yet. The
modified pressure is used to impose this fundamental property of the flow
filed. Therefore, p̃*t is computed solving the Poisson equation

(14)∂j∂jp̃*t=∂kΓkt,

obtained by taking the divergence of the NS equations. The term Γkt
on the right-hand side of the equation above is given by

(15)Γkt=23Δtũkt-13RHS̃kt-Δt.

The new flow field for the complete time step is obtained by ũit=ũi*-32Δt∂ip̃*t. Finally, the
new right-hand side is updated with the pressure gradient as
RHS̃it+Δt=RHS̃i*-∂ip̃*t.

Embedded within this approach, periodic boundary conditions are imposed on
the horizontal (x,y) directions. To close the system, a stress-free lid
boundary condition is imposed at the top of the domain and an impermeability
(w̃=0) condition is imposed at the lower boundary, which sums to the
parameterized stress described in Sect. 3.

Figure 2Two-dimensional pencil decomposition of the computational domain with
the domain transposed into the three direction of space: (a)X pencil,
(b)Y pencil, (c)Z pencil (based on Li and Laizet, 2010).

The code is parallelized following a 2-D pencil decomposition paradigm similar
to the one presented in Sullivan and Patton (2011), partitioning the domain into
squared cylinders aligned along one of the horizontal directions, as shown in
Fig. 2. The 2-D pencil decomposition is implemented using the
2DECOMP & FFT open-source library (Li and Laizet, 2010), which shows exceptional
scalability up to a large number of message passing interface (MPI) processes (Margairaz et al., 2017).

3.3 Analysis of the numerical cost

The LES algorithm can be separated into four distinct modules: (1) computation of the velocity gradients,
(2) evaluation of the SGS stresses and
(3) of the convective term, and (4) computation of the pressure field by
solving the Poisson equation. These four modules represent the bulk of the
computational cost of the code, in addition to MPI communication. Figure 3
presents a simplified flowchart of the main algorithm
with each of the four modules.

Figure 3Simplified flowchart of the main algorithm presenting the
four modules that represent the bulk of the computational cost.

The four modules have been individually timed to evaluate their corresponding
computational cost at a resolution of Nx×Ny×Nz=1283 with
the 3∕2 rule as a baseline. Results are shown in Fig. 4.
As can be observed, more than half of the integration time step (∼60%) is spent computing the convective term. The three other modules share
the rest of the integration time as follows: the computation of the velocity
gradients (∼20%), the Poisson solver (∼16%), and the SGS (∼4%). It is important to note that this test was conducted without any
file input and output as it is not relevant to assess the computational cost of the momentum
integration. As explained in Sect. 2, the nonlinear
term requires the use of dealiasing techniques to control the aliasing error,
which traditionally are associated with a padding operation (as mentioned in
Sect. 2) and hence higher computational cost. It is
important to note that although the overall integration time distribution
between each individual module might vary depending on the numerical
resolution employed, the overall cost of the convective term will remain
important regardless of the changes in numerical resolution. The goal of this
work is to explore the possibility of using alternative dealiasing techniques
to enhance the computational performance, while maintaining accurate
turbulent flow statistics in simulations of ABL flows. It is important to
note that the SGS model used here takes a relatively small fraction of the
time integration. This fraction is likely to be larger if a more
sophisticated model is used, for example the dynamic Smagorinsky model
(Germano et al., 1991) or the Lagrangian scale-dependant model
(Bou-Zeid et al., 2005). In addition, it is important to note that the low
computational cost of the Poisson solver is related to the use of pencil
decomposition, which takes full advantage of the pseudo-spectral
approach. Specifically, the Z pencil combines with the horizontal treatment
of the derivatives to make the implementation of the solver very efficient.

Figure 4Individual timing of the four modules of the time loop averaged
over 10 k steps: (a) velocity gradient, (b) SGS, (c) convective term,
(d) Poisson solver, and (e) total time loop. The numerical resolution
is 1283, run with 64 MPI processes and a domain decomposition of 8×8.

3.4 Study cases

The goal of this study is to develop a cost–benefit analysis for the
different, already established, dealiasing methods from a computational cost
standpoint as well as in terms of accuracy in reproducing turbulent flow
characteristics. For this reason, three different cases have been considered, corresponding to the three dealiasing methods considered: (a) the 3∕2 rule
used as reference, (b) the FT method, and (c) the
FS method. All the simulations have been run with a
numerical resolution of Nx×Ny×Nz=643, 1283, 1923, and
2563, with a domain size of (Lx,Ly,Lz)=(2π,2π,1)⋅zi, where
zi is the height of the boundary layer taken here with a value of
zi=1000 m. A uniform surface roughness of value z0/zi=10-4 is
imposed, which is representative of sparse forest or farmland with many
hedges (Brutsaert, 1982; Stull, 1988). The simulations have been initialized
with a vertical logarithmic profile with added random noise for the
ũ1 component. The two other velocity components ũ2 and ũ3
have been initialized with an averaged zero velocity profile with added
noise to generate the initial turbulence. The integration time step is set to
Δt=0.2 s for the 643, 1283, and the 1923 simulations and
to Δt=0.1 s for the 2563 simulation. These time steps ensure
that stability requirements for the time integration scheme are met. The
Smagorinsky constant and the wall damping exponent are set to CS=0.1 and
n=2(Mason, 1994; Porté-Agel et al., 2000; Sagaut, 2006).

For each dealiasing method, the simulations at 643, 1283, 1923, and
2562 were run until the flow reached statistic convergence of the friction
velocity u* and the mean kinetic energy. This warm-up time was fixed to
∼95 T (where T is the flow-through time, defined as T=U∞t/Lx). At this point, running averages were computed to
evaluate the different flow statistics presented in the following sections.
To provide long enough averaging times, the 643 and 1283 simulations
were run for an additional ∼190 T. The 1923 and
2563 simulations were run for an additional ∼100 T. In parallel,
runs with higher horizontal resolution were used to evaluate the
computational cost of the dealiasing methods with increasing numerical
resolution (timing runs). These last simulations were only run for a few
thousand iterations. Table 1 contains a summary of all
the simulations performed in this work.

Table 1Simulations summary, each simulation was run
with the three different dealiasing methods.

4.1 Computational cost

The computational cost of evaluating the convective term, dealiased via
the 3∕2 rule, the FT, and the FS is intercompared in Fig. 5. The horizontal resolution has been increased from
128×128 to 2048×2048 collocation nodes to highlight how the
different methods scale. Only the horizontal resolution is changed given that
the vertical direction is treated in physical space with the second-order
accurately centered finite-difference method. Note that such a method does not
typically require any dealiasing treatment, because the truncation error
tends to decrease the aliasing errors (Canuto et al., 2006; Kravchenko et al., 1996).
In Fig. 5, the ordinate axis is divided by nx⋅ny
to show the effect of the increase in resolution on the computational cost.
The number of MPI processes and the domain decomposition have been kept
identical to avoid introducing effects from the parallelization scaling into
the results. Hence, only the effect of the resolution change on the
computation time of the dealiasing methods is presented here. Results confirm
that the computational cost of the convective term is notably smaller when
using the FT and FS dealiasing methods, with gains of 30 % at nx×ny=128×128 and 23 % at nx×ny=2048×2048. The results
follow the computational cost calculated by the number of operations
presented in Sect. 2, which predicted a gain of up to
35% for runs with 40963 grid nodes. The deviation in the computational
cost present in Fig. 5 is the result of the varying load of
the computer cluster since all simulations were run using the same number of
nodes to avoid having to add the code's scaling properties to the analysis.
From the results it is also important to note that there is no significant
difference in the computational cost between the FT and FS methods, given
that both use the same grid size and hence the corresponding numerical
complexity of both methods is similar. It is also worth mentioning that these
methods are simpler to implement and require less rapid-access memory when
compared to the 3∕2 rule, as there is no need to extend either the numerical
grid or the wavenumber range.

Figure 5Computational cost of the convective module as a
function of the horizontal resolution. The timing of the module
is presented on the left vertical axis and represented by
left-pointing arrows. The number of operations is shown on the
right vertical axis and represented by right-pointing arrows. The
three different dealiasing methods are plotted: the 3∕2 rule as the blue
dot–dashed line, the FT method as the orange dotted line, and the FS method
as the yellow dashed line. The numerical resolutions are
nx×ny×128, run with 64 MPI processes and a
domain decomposition of 8×8 elements.

In the following subsections, we compare the impact of the different
dealiasing schemes on flow statistics. Profiles from runs using the 3∕2 rule
for dealiasing will be taken as reference, and comments will focus on
departures from such “exact” profiles when the FT or FS treatments are
considered.

4.2 Flow statistics

Traditional flow metrics are investigated next and compared between the
different dealiasing schemes. Results for the 1283, 1923, and
2563 cases are presented in this section. The results are normalized using
the traditional scaling variables: the friction velocity (u*) and the
boundary-layer height (zi). As a starting point, Fig. 6
shows an instantaneous snapshot (pseudo-color plots) of the streamwise
velocity perturbation for the three dealiasing methods. An additional case
without dealiasing in the convective term was run and resulted in a complete
laminarization of the turbulent flow (not shown here), highlighting the
importance of the dealiasing operation (Kravchenko and Moin, 1997). By contrast, when dealiasing schemes are applied, the instantaneous flow field
appears qualitatively similar among the different cases. Irrespective of the
dealiasing method that is used, streamwise elongated high- and low-momentum
bulges flank each other, as is apparent in Fig. 6. This is a
common phenomena in pressure-driven boundary-layer flows (Munters et al., 2016).
Qualitatively, small differences can be appreciated on the structure and
distribution of the smaller-scale turbulence within the flow only in the FT
method. For example, the flow in panel b shows the effect of
the cutoff filter, where high-frequency perturbations occur throughout the
considered pseudo-color plot. These spurious oscillations have an impact on
the flow statistics, as will be shown in the following.

Figure 7Top panels represent the plots of the non-dimensional mean
streamwise velocity profile for (a) the 1283 case, (b) the
1923 case, and (c) the 2563 case. Bottom panels represent the
mean streamwise velocity gradient for (d) the 1283 case, (e) the
1923 case, and (f) the 2563 case. The lines represent the three
different dealiasing methods: the 3∕2 rule as the blue dot–dashed line, the FT method as the red dotted line, and the FS method as the yellow dashed line. The
solid line represents the ideal log–law profile.

The horizontally and temporally averaged velocity profiles are characterized
by an approximately logarithmic behavior within the surface layer (z≈0.15zi, as is apparent from Fig. 7, where results are
illustrated for the following three resolutions: 1283, 1923, and 2563). For the
1283 case, the observed departure from the logarithmic profile for the
3∕2-rule case is in excellent agreement with results from previous literature
for this particular SGS model (Bou-Zeid et al., 2005; Porté-Agel et al., 2000). When using
the FT method, the agreement of the averaged velocity profile with the
corresponding 3∕2-rule profiles improves with increasing resolution. While in
the 1283 case a good estimation of the logarithmic flow is obtained at the
surface layer, there is a large acceleration of the flow further above. This
overshoot does not occur for the higher-resolution runs. When using the FS
method, the mean velocity magnitude is consistently overpredicted throughout
the domain, and the situation does not improve with increasing resolution
(the overshoot is up to 7.5 % for the 1283, 8.5 % for the 1923, and 7 %
for the 2563 run). Further comparing the results obtained by the FS and FT
methods with those obtained with the 3∕2 rule, it is clear that while the FS
method presents a generalized overestimation of the velocity with an
overall good logarithmic trend, the FT method presents a better adjustment in
the surface layer with larger departures from the logarithmic regime in the
upper-domain region that decrease with increasing numerical resolution.
The mean kinetic energy of the system is overestimated by ≈+2% and
≈+12% by the FT and FS methods, when compared to that of runs using
the 3∕2 rule in the 2563 case. Overall, the mean kinetic energy is larger
for the FT and FS cases, when compared to the 3∕2-rule case, even at the
highest of the considered resolutions (≈+2 and ≈+12% by
the FT and FS methods for the 2563 case). Such behavior can be related to
the low-pass filtering operation that is performed in the near-wall regions,
which tends to reduce resolved turbulent stresses in the near-wall region,
resulting in a higher mass flux for the considered flow system. This is more
apparent for the low-resolution cases.

Mean velocity gradient profiles (Φm=κzu*∂z〈U〉xy(z)) are also featured in Fig. 7d, e, and f.
Profiles at each of the considered resolutions present a large
overshoot near the surface, which is a well-known problem in LES of
wall-bounded flows and has been extensively discussed in the literature
(Bou-Zeid et al., 2005; Brasseur and Wei, 2010; Lu and Porté-Agel, 2013). In comparing the results between
the FS and FT method with the 3∕2 rule, it can be observed that there are
stronger gradients in the mean velocity profile within the surface layer when
using the FS method. This leads to the observed shift in the mean velocity
profile. Conversely, when using the FT method, departures are of an oscillatory
nature, leading to less pronounced variations in the mean velocity profile
when compared to the reference ones (the 3∕2-rule cases). This behavior is
consistently found across the considered resolutions, but the situation
ameliorates as resolution is increased (i.e., weaker departures).

Figure 8Profiles of the non-dimensional variances (a–c) and
shear stress (d) using a numerical resolution of 2563 for the three different dealiasing methods:
the 3∕2 rule as the blue dot–dashed line, the FT method as the red dotted line, and the FS method as the yellow dashed line.

Figure 8 features the vertical structure of second-order
statistics predicted via the FS and the FT methods, including a comparison
with the corresponding predictions from the 3∕2-rule, 2563-study case.
Note that when averaging in space and (subsequently) in time, the resulting
profiles are comparable to those found in previously published LES studies
(Bou-Zeid et al., 2005; Porté-Agel et al., 2000). The fact that the
shear stress profiles are similar among the different dealiasing cases is
also indicative of the fact that the SGS fraction is not strongly affected by the choice
of dealiasing method, which is also partly due to the simplicity of the
static Smagorinsky model that is being used. The potential effect that the
different dealiasing schemes could have in more advanced subgrid models is
discussed later on. Specifically, the error in the Reynolds shear stress
(e.g., not including the SGS contribution) in the surface layer
decreases with increasing resolution for the FS method (from 1.7 to 1.1 %
for the 1283 and 2563, respectively) as indicated in Table 2
and also fluctuates around very small values for the FT method. When considering the diagonal stress tensor components across
simulations, it is noteworthy that all such quantities are overpredicted when
using the FT and the FS methods in the near-surface region (z⪅0.1). Further above, the FS method tends to consistently overpredict,
whereas the FT method presents an oscillatory nature. As can be observed in
Table 2, the mean error deviations decrease with
increasing resolution for all cases, except for the streamwise variance
where there is no clear trend.

Table 2Mean error in the variance profiles between the FT∕FS
methods and the 3∕2 rule over the lower 15 % of the domain. For example, the
error of the streamwise variance is computed as
err(u′u′‾)=1n∑1-u′u′‾FT/FSu′u′‾3/2
for 0<z<0.15⋅zi.

To complement the analysis of the effect of the different dealiasing methods
on the physical structure of the flow, the corresponding power spectra are investigated. According to Kolmogorov's energy cascade theory, the inertial
subrange of the power spectrum should be characterized by a power law of
-5/3 slope (Kolmogorov, 1968). In this range the effects of viscosity,
boundary conditions, and large-scale structures are not important. Also, in
wall-bounded flows without buoyancy effects, a production range should also
be present, following a power-law scaling of −1(Calaf et al., 2013; Gioia et al., 2010; Katul et al., 2012). Figure 9 shows the
energy spectra of the streamwise velocity obtained using the different
dealiasing methods. The spectrum obtained using the3∕2 rule matches well the
traditional turbulent spectra presented in the literature (Bou-Zeid et al., 2005; Cerutti, 2000) and it is used to assess the effects introduced by the FT and
FS dealiasing methods. From this spectral analysis, it can be observed that
the high-wavenumber ranges are modified by both methods. The FT method
sharply cuts the spectra at the scale of 3/2⋅Δ close to the LES
filter-scale Δ. On the other hand, the FS method smoothly attenuates
the effects of the aliasing errors at the high end of the spectra. The
dealiasing methods have been designed for such behavior, since only the
higher frequencies are filtered. From the FT method flow field spectra, the effect of the cutoff applied within the dealiasing
scheme is clearly visible. It is apparent that the capacity of the LES solver to reproduce the
fine-scale turbulence structure of the flow is strongly jeopardized when
using the FT method and limited at the scale of 3/2⋅Δ close to the
LES filter-scale Δ. Essentially, this method artificially
over-dissipates the turbulent kinetic energy and yields to an overestimation
of the mean kinetic energy. In contrast, the energy spectrum obtained using
the FS method does not produce such a large energy cutoff. Therefore, a
larger range of the spectrum is resolved and less turbulent kinetic energy is
dissipated by aliasing errors.

Although the effect of the FT and FS methods on the small scale can be
clearly observed in Fig. 9, their effect on the large
scales also needs to be quantified. To compute a direct comparison scale by
scale, the following ratio was used (Eq. 16) for the
1283, 1923, and 2563 simulations:

(16)ρXX(k)=Eu,kXX-Eu,k3/2Eu,k3/2

where Eu,k denotes the power spectral density of the u velocity
component at wavenumber k and XX stands for the dealiasing method FT or FS.
If ρ(k)<0, the energy density at that given wavenumber (k) is less
than the corresponding one for the run using the 3∕2 rule; vice versa if
ρ(k)>0. Figure 10 presents the ratio ρ(k)
for both methods.

When using the FT method, energy at the low wavenumbers is underpredicted,
whereas energy at the large wavenumbers is overpredicted. Departures are in
general larger with decreasing resolution, with an excess of up to 100%
for the 1283 simulations in the wavenumber range close to the cutoff
wavenumber. By contrast, when using the FS method, the energy from the
filtered (dealiased) small scales is redistributed quasi-uniformly throughout
the spectra with an averaged overall variation of less than 13 %.

Figure 9Normalized streamwise spectra of the streamwise velocity
as a function of kxz for the 1923 simulations. The three
different dealiasing methods are the 3∕2 rule (a), the FT method
(b), and the FS method (c) at heights z/zi=0.0117, 0.0273, 0.0586,
0.0898, 0.1523, 0.2148, 0.3086, 0.4336, 0.5586, and 0.6211.

Figure 10Effect of the FT (a) and the FS (b) methods of the streamwise
spectra of the streamwise velocity compare to the 3∕2 rule. The solid
line represent the average value and the shaded area represent the
extreme values. The resolutions are 1283 as the blue dot–dashed line, 1923 as the red dotted line, and 2563 as the purple dashed line.

In the development of this paper, focus has been directed to the study
of the advantages and disadvantages of different dealiasing methods. In this
regard, throughout the analysis, we have tried to keep the structure of the
LES configuration as simple and canonical as possible to remove the effect
of other add-on complexities. Additional complications might arise when
considering additional physics; here we discuss the potential effect that
these different dealiasing methods could have on them. One such element of added complexity is, for example, the use of more sophisticated subgrid-scale models based on dynamic approaches to determine the values of the
Smagorinsky constant (Bou-Zeid et al., 2005; Germano et al., 1991). In most of these
advanced subgrid models, information from the small-scale turbulent eddies is
used to determine the evolution of the subgrid constant. However, in both the
FT and FS method, the small turbulent scales are severely affected, and hence
the use of dynamic subgrid models could be severely hampered unless these are
accordingly modified and adjusted, for example via filtering at larger scales than
the usual grid scale. Another element of added complexity consists of using
more realistic atmospheric forcing, considering for example the effect of the
Coriolis force with flow rotation as a function of height and velocity
magnitude. In this case, we hypothesize that the FT method could lead to
stronger influences on the resultant flow field as this dealiasing technique
not only affects the distribution of energy on small turbulent scales but also on large scales (as is apparent from Fig. 10),
the latter being potentially more affected
by the Coriolis force. This represents a strong nonlinear effect that is
hard to quantify, and hence further testing, including realistic forcing with
a geostrophic wind and Coriolis force, would be required to better quantify
these effects. Also, often in LES studies of atmospheric flows, one is
interested in including an accurate representation of scalar transport
(passive/active). In this case the differential equations do not include a
pressure term, and hence most of the computational cost is linked to the
evaluation of the convective term. As a result, the benefit of using
alternative, cheaper dealiasing techniques (FT or FS) will be even more
advantageous, yet the total gain is not trivial to evaluate a priori,
and the effect on the scalar fields should also be further evaluated.

In general, we believe that it is not fair to advocate for one or other
dealiasing method based on the results of this analysis. Note that the goal
of this work is to provide an objective analysis of the advantages and
limitations that the different methods provide, leaving the readers with the
ultimate responsibility of choosing the option that will adjust better to their
application. For example, while having exact dealiasing (3∕2 rule) might be
better in studies focusing on turbulence and dispersion, one might be
better off using a simpler and faster dealiasing scheme to run the
traditionally expensive warm-up runs or to evaluate surface drag in flow
over urban and vegetation canopies, where most of the surface force is due to
pressure differences (Patton et al., 2016).

The Fourier-based pseudo-spectral collocation method
(Canuto et al., 2006; Orszag, 1970; Orszag and Pao, 1975) remains the preferred
workhorse in simulations of wall-bounded flows over horizontally periodic
regular domains and is often used in conjunction with a centered
finite-difference or Chebychev polynomial expansions in the vertical
direction (Kopriva and Kolia, 1996; Moeng and Sullivan, 2015; Shah and Bou-Zeid, 2014). This approach is often used because of
the high-order accuracy and the intrinsic efficiency of the
fast Fourier transform algorithm (Cooley and Tukey, 1965; Frigo and Johnson, 2005). In this
technique, aliasing that arises when evaluating the quadratic nonlinear term
in the NS equations can severely deteriorate the quality of the solution and
hence needs to be treated adequately. In this work, a performance–cost analysis
has been developed for three well-accepted dealiasing techniques – the 3∕2 rule,
the FT method and the FS method – to evaluate the corresponding advantages and limitations. The
3∕2 rule requires a computationally expensive padding and truncation
operation, while the FT and FS methods provide an approximate dealiasing by
low-pass filtering the signal over the available wavenumbers, which comes at
a reduced cost.

The presented results show compelling evidence of the benefits of these
methods as well as some of their drawbacks. The advantage of using the FT or
the FS approximate dealiasing methods is their reduced computational cost
(cutback on the total simulation time of ∼15 % for the 1283 case,
∼21 % for the 2563 case), with an
increased gain as the numerical resolution is increased. Regarding the flow
statistics, results illustrate that both the FT and the FS methods yield
less accurate results when compared to those obtained with the traditional
3∕2 rule, as one could expect.

Specifically, results illustrate that both the FT and FS methods
over-dissipate the turbulent motions in the near-wall region, yielding an
overall higher mass flux when compared to the reference one (3∕2 rule).
Regarding the variances, results illustrate modest errors in the
surface layer, with local departures in general below 10 % of the reference
value across the considered resolutions. The observed departures in terms of
mass flux and velocity variances tend to reduce with increasing resolution.
Analysis of the streamwise velocity spectra has also shown that the FT method
redistributes unevenly the energy across the available wavenumbers, leading
to an overestimation of the energy of some scales by up to 100 %. By contrast,
the FS method redistributes the energy evenly, yielding a modest +13 %
energy magnitude throughout the available wavenumbers. Compared to the
3∕2 rule, these differences in flow statistics are the result of the sharp
low-pass filter applied in the FT method and the smooth filter that
characterizes the FS method.

Due to the large amount of data generated during this
study, no lasting structure can be permanently supported where to freely
access the data. However, access can be provided using the Temporary Guest
Transfer Service of the Center of High Performance Computing of the
University of Utah. To get access to the data, Marc Calaf
(marc.calaf@utah.edu) will provide temporary login information for the sftp
server.

Fabien Margairaz and Marc Calaf acknowledge the Mechanical Engineering
Department at the University of Utah for start-up funds. Marco G. Giometto
acknowledges the Civil Engineering and Engineering Mechanics Department at
Columbia University for start-up funds. Marc B. Parlange is grateful to NSERC
and Monash University for their support. The authors would also like to
recognize the computational support provided by the Center for High
Performance Computing (CHPC) at the University of Utah as well as the Extreme
Science and Engineering Discovery Environment (XSEDE) platform (project
TG-ATM170018).

Lu, H. and Porté-Agel, F.: On the Development of a Dynamic Non-linear
Closure for Large-Eddy Simulation of the Atmospheric Boundary Layer,
Bound.-Lay. Meteorol., 151, 1–23, https://doi.org/10.1007/s10546-013-9906-y, 2014. a

Salesky, S. T., Chamecki, M., and Bou-Zeid, E.: On the Nature of the
Transition Between Roll and Cellular Organization in the Convective Boundary
Layer, Bound.-Lay. Meteorol., 163, 1–28,
https://doi.org/10.1007/s10546-016-0220-3, 2016. a

In this project, we compare three different approaches to integrate the fluid-motion equations when applied to solve atmospheric flow dynamics. Differences between the three methods reside in accuracy as well as computational cost. The results illustrate that there is an intermediate solution that performs well in terms of computational cost while at the same time producing good enough results, as long one is not interested in the smallest turbulent scales.

In this project, we compare three different approaches to integrate the fluid-motion equations...