This glossary explains a few terms that are frequently used in the documentation of classes of deal.II. The glossary often only gives a microscopic view of a particular concept; if you struggle with the bigger picture, it may therefore also be worth to consult the global overview of classes on the index page.

Active cells

A cell, face or edge is defined as active if it is not refined any further, i.e., if it does not have children. Unless working with a multigrid algorithm, active cells are the only ones carrying degrees of freedom.

Artificial cells

If a mesh is distributed across multiple MPI processes using the parallel::distributed::Triangulation class, each processor stores only the cells it owns, one layer of adjacent cells that are owned by other processors (called ghost cells), all coarse level cells, and all cells that are necessary to maintain the invariant that adjacent cells must differ by at most one refinement level. The cells stored on each process that are not owned by this process and that are not ghost cells are called "artificial cells", and for these cells the predicate cell->is_artificial() returns true. Artificial cells are guaranteed to exist in the globally distributed mesh but they may be further refined on other processors. See the Distributed Computing paper for more information.

The concept of artificial cells has no meaning for triangulations that store the entire mesh on each processor, i.e. the Triangulation class.

Block (linear algebra)

It is often convenient to treat a matrix or vector as a collection of individual blocks. For example, in step-20 (and other tutorial programs), we want to consider the global linear system \(Ax=b\) in the form

where \(U,P\) are the values of velocity and pressure degrees of freedom, respectively, \(M\) is the mass matrix on the velocity space, \(B\) corresponds to the negative divergence operator, and \(B^T\) is its transpose and corresponds to the negative gradient.

Using such a decomposition into blocks, one can then define preconditioners that are based on the individual operators that are present in a system of equations (for example the Schur complement, in the case of step-20), rather than the entire matrix. In essence, blocks are used to reflect the structure of a PDE system in linear algebra, in particular allowing for modular solvers for problems with multiple solution components. On the other hand, the matrix and right hand side vector can also treated as a unit, which is convenient for example during assembly of the linear system when one may not want to make a distinction between the individual components, or for an outer Krylov space solver that doesn't care about the block structure (e.g. if only the preconditioner needs the block structure).

Splitting matrices and vectors into blocks is supported by the BlockSparseMatrix, BlockVector, and related classes. See the overview of the various linear algebra classes in the Linear algebra classes module. The objects present two interfaces: one that makes the object look like a matrix or vector with global indexing operations, and one that makes the object look like a collection of sub-blocks that can be individually addressed. Depending on context, one may wish to use one or the other interface.

Typically, one defines the sub-structure of a matrix or vector by grouping the degrees of freedom that make up groups of physical quantities (for example all velocities) into individual blocks of the linear system. This is defined in more detail below in the glossary entry on Block (finite element).

Block (finite element)

Intent: Blocks are a generalization of components in that they group together one or more components of a vector-valued finite element that one would like to consider jointly. One often wants to do this to define operators that correspond to the structure of a (part of a) differential operator acting on the vector-valued solution, such as the Schur complement solver in step-20, or the block solvers and preconditioners of step-22.

For the purpose of a discretization, blocks are the better concept to use since it is not always possible to address individual components of a solution. This is, in particular, the case for non-primitive elements. Take for instance the solution of the mixed Laplacian system with the FE_RaviartThomas element (see step-20). There, the first dim components are the directional velocities. Since the shape functions are linear combinations of those, these dim components constitute only a single block. On the other hand, the pressure variable is scalar and would form a the second block, but in the dim+1st component.

The minimal size of each block is dictated by the underlying finite element (a block consists of a single component for scalar elements, but in the case of the FE_RaviartThomas, for example, a block consists of dim components). However, several such minimal blocks can be grouped together into user defined blocks at will, and in accordance with the application. For instance, for the Q2d-Q1 (Taylor-Hood) Stokes element, there are d+1 components each of which could in principle form its own block. But we are typically more interested in having only two blocks, one of which consists of all the velocity vector components (i.e. this block would have d components) and the other having only the single pressure component.

The exception is the FESystem class that takes multiple simpler elements and connects them into more complicated ones. Consequently, it can have more than one block. A FESystem has as many blocks as it has base elements times their multiplicity (see the constructors of FESystem to understand this statement). In other words, it does not care how many blocks each base element has, and consequently you can produce a Stokes element that has only two blocks by creating the object

With the exception of the number of blocks, the two objects are the same for all practical purposes, however.

Global degrees of freedom: While we have defined blocks above in terms of the vector components of a vector-valued solution function (or, equivalently, in terms of the vector-valued finite element space), every shape function of a finite element is part of one block or another. Consequently, we can partition all degrees of freedom defined on a DoFHandler into individual blocks. Since by default the DoFHandler class enumerates degrees of freedom in a more or less random way, you will first want to call the DoFRenumbering::component_wise function to make sure that all degrees of freedom that correspond to a single block are enumerated consecutively.

If you do this, you naturally partition matrices and vectors into blocks as well (see block (linear algebra)). In most cases, when you subdivide a matrix or vector into blocks, you do so by creating one block for each block defined by the finite element (i.e. in most practical cases the FESystem object). However, this needs not be so: the DoFRenumbering::component_wise function allows to group several vector components or finite element blocks into the same logical block (see, for example, the step-22 or step-31 tutorial programs, as opposed to step-20). As a consequence, using this feature, we can achieve the same result, i.e. subdividing matrices into \(2\times 2\) blocks and vectors into 2 blocks, for the second way of creating a Stokes element outlined above using an extra argument as we would have using the first way of creating the Stokes element with two blocks right away.

Selecting blocks: Many functions allow you to restrict their operation to certain vector components or blocks. For example, this is the case for the functions that interpolate boundary values: one may want to only interpolate the boundary values for the velocity block of a finite element field but not the pressure block. The way to do this is by passing a BlockMask argument to such functions, see the block mask entry of this glossary.

Block mask

In much the same way as one can think of elements as being composed of physical vector components (see GlossComponent) or logical blocks (see GlossBlock), there is frequently a need to select a set of such blocks for operations that are not intended to be run on all blocks of a finite element space. Selecting which blocks to work on happens using the BlockMask class.

While components and blocks provide two alternate but equally valid viewpoints on finite elements with multiple vector components, the fact is that throughout the library there are far more places where you can pass a ComponentMask argument rather than a BlockMask argument. Fortunately, one can be converted into the other, using the syntax fe.component_mask(block_mask) where block_mask is a variable of type BlockMask. In other words, if you have a block mask but need to call a function that only accepts a component mask, this syntax can be used to obtain the necessary component mask.

Creation of block masks: Block masks are typically created by asking the finite element to generate a block mask from certain selected vector components using code such as this where we create a mask that only denotes the velocity components of a Stokes element (see Handling vector valued problems):

would yield a block mask that in 2d has elements [false, false, true] because the element has dim+1 components and equally many blocks. See the discussion on what a block represents exactly in the block entry of this glossary.

Boundary form

For a dim-dimensional triangulation in dim-dimensional space, the boundary form is a vector defined on faces. It is the vector product of the image of coordinate vectors on the surface of the unit cell. It is a vector normal to the surface, pointing outwards and having the length of the surface element.

A more general definition would be that (at least up to the length of this vector) it is exactly that vector that is necessary when considering integration by parts, i.e. equalities of the form \(\int_\Omega \text{div} \vec \phi = -\int_{\partial\Omega} \vec n \cdot \vec \phi\). Using this definition then also explains what this vector should be in the case of domains (and corresponding triangulations) of dimension dim that are embedded in a space spacedim: in that case, the boundary form is still a vector defined on the faces of the triangulation; it is orthogonal to all tangent directions of the boundary and within the tangent plane of the domain. Note that this is compatible with case dim==spacedim since there the tangent plane is the entire space \({\mathbb R}^\text{dim}\).

In either case, the length of the vector equals the determinant of the transformation of reference face to the face of the current cell.

Boundary indicator

In a Triangulation object, every part of the boundary may be associated with a unique number (of type types::boundary_id) that is used to determine what kinds of boundary conditions are to be applied to a particular part of a boundary. The boundary is composed of the faces of the cells and, in 3d, the edges of these faces.

By default, all boundary indicators of a mesh are zero, unless you are reading from a mesh file that specifically sets them to something different, or unless you use one of the mesh generation functions in namespace GridGenerator that have a 'colorize' option. A typical piece of code that sets the boundary indicator on part of the boundary to something else would look like this, here setting the boundary indicator to 42 for all faces located at \(x=-1\):

In older versions of the library (prior to 8.2), if you wanted also to change the way the Triangulation class treated the boundary for the purposes of mesh refinement, you could call Triangulation::set_boundary to associate a boundary object with a particular boundary indicator. This method is still supported, and it allows the Triangulation object to use a different method of finding new points on faces and edges to be refined; the default is to use a StraightBoundary object for all faces and edges. The results section of step-49 has a worked example that shows all of this in action.

The suggested method from version 8.2 onwards, is to split the geometrical description of the boundary from its physical meaning, by using separately manifold_ids and boundary_ids. The former are used to describe how the geometry changes, and the latter are used to identify the boundary conditions.

Boundary indicators are inherited from mother faces and edges to their children upon mesh refinement. Some more information about boundary indicators is also presented in a section of the documentation of the Triangulation class.

When considering systems of equations in which the solution is not just a single scalar function, we say that we have a vector system with a vector-valued solution. For example, the vector solution in the elasticity equation considered in step-8 is \(u=(u_x,u_y,u_z)^T\) consisting of the displacements in each of the three coordinate directions. The solution then has three elements. Similarly, the 3d Stokes equation considered in step-22 has four elements: \(u=(v_x,v_y,v_z,p)^T\). We call the elements of the vector-valued solution components in deal.II. To be well-posed, for the solution to have \(n\) components, there need to be \(n\) partial differential equations to describe them. This concept is discussed in great detail in the Handling vector valued problems module.

In finite element programs, one frequently wants to address individual elements (components) of this vector-valued solution, or sets of components. For example, we do this extensively in step-8, and a lot of documentation is also provided in the module on Handling vector valued problems. If you are thinking only in terms of the partial differential equation (not in terms of its discretization), then the concept of components is the natural one.

On the other hand, when talking about finite elements and degrees of freedom, components are not always the correct concept because components are not always individually addressable. In particular, this is the case for non-primitive finite elements. Similarly, one may not always want to address individual components but rather sets of components — e.g. all velocity components together, and separate from the pressure in the Stokes system, without further splitting the velocities into their individual components. In either case, the correct concept to think in is that of a block. Since each component, if individually addressable, is also a block, thinking in terms of blocks is most frequently the better strategy.

Selecting components: Many functions allow you to restrict their operation to certain vector components or blocks. For example, this is the case for the functions that interpolate boundary values: one may want to only interpolate the boundary values for the velocity components of a finite element field but not the pressure component. The way to do this is by passing a ComponentMask argument to such functions, see the component mask entry of this glossary.

Component mask

When using vector-valued elements (see Handling vector valued problems) to solve systems of equations, one frequently wants to restrict some operations to only certain solution variables. For example, when solving the Stokes equations, one may wish to only interpolate boundary values for the velocity components but not the pressure. In deal.II, this is typically done by passing functions a component mask. Component masks are always specified as a ComponentMask object which one can think of as an array with as many entries as the finite element has components (e.g., in the Stokes case, there are dim+1 components) and where each entry is either true or false. In the example where we would like to interpolate boundary values only for the velocity components of the Stokes system, this component mask would then be [true, true, false] in 2d and [true, true, true, false] in 3d to indicate that no boundary values shall be set for the pressure variable (the last of the dim+1 vector components of the solution.

Semantics of component masks: Many of the functions that take a component mask object that has been default constructed to indicate all components, i.e., as if the vector had the correct length and was filled with only true values. The reason is that default initialized objects can be constructed in place using the code snippet ComponentMask() and can thus be used as a default argument in function signatures.

In other words, ComponentMask objects can be in one of two states: They can have been initialized by a vector of booleans with a nonzero length; in that case, they represent a mask of a particular length where some elements may be true and others may be false. Or, the ComponentMask may have been default initialized (using the default constructor) in which case it represents an array of indefinite length (i.e., a length appropriate to the circumstances) in which every entry is true.

Creation of component masks: Component masks are typically created by asking the finite element to generate a component mask from certain selected components using code such as this where we create a mask that only denotes the velocity components of a Stokes element (see Handling vector valued problems):

Not all component masks actually make sense. For example, if you have a FE_RaviartThomas object in 2d, then it doesn't make any sense to have a component mask of the form [true, false] because you try to select individual vector components of a finite element where each shape function has both \(x\) and \(y\) velocities. In essence, while you can of course create such a component mask, there is nothing you can do with it.

Compressing distributed vectors and matrices

For parallel computations, deal.II uses the vector and matrix classes defined in the PETScWrappers and TrilinosWrappers namespaces. When running programs in parallel using MPI, these classes only store a certain number of rows or elements on the current processor, whereas the rest of the vector or matrix is stored on the other processors that belong to our MPI universe. This presents a certain problem when you assemble linear systems: we add elements to the matrix and right hand side vectors that may or may not be stored locally. Sometimes, we may also want to just set an element, not add to it.

There is one snag, however: both PETSc and Trilinos need to know whether the operation that these compress() functions invoke applies to adding elements or setting them. In some cases, not all processors may be adding elements, for example if a processor does not own any cells when using a very coarse (initial) mesh. For this reason, compress() takes an argument of type VectorOperation, which can be either ::add, or ::insert. This argument is required for vectors and matrices starting with the 7.3 release.

In short, you need to call compress() in the following cases (and only in those cases, though calling compress() in other cases just costs some performance):

When you are done setting individual elements in a matrix/vector before any other operations are done (adding to elements, other operations like scaling, solving, reading, etc.). Use VectorOperation::insert.

All other operations like scaling or adding vectors, assignments, calls into deal.II (VectorTools, ConstraintMatrix, ...) or solvers do not require calls to compress().

Note

Compressing is an operation that only applies to vectors whose elements are uniquely owned by one and only one processor in a parallel MPI universe. It does not apply to vectors with ghost elements.

Concepts in deal.II

There are several places in deal.II where we require that a type in a template match a certain interface or behave in a certain way: such constraints are called concepts in C++. See the discussion in Concepts, or expectations on template parameters for more information and a list of concepts in deal.II.

Degree of freedom

The term "degree of freedom" (often abbreviated as "DoF") is commonly used in the finite element community to indicate two slightly different, but related things. The first is that we'd like to represent the finite element solution as a linear combination of shape functions, in the form \(u_h(\mathbf{x}) = \sum_{j=0}^{N-1} U_j \varphi_j(\mathbf{x})\). Here, \(U_j\) is a vector of expansion coefficients. Because we don't know their values yet (we will compute them as the solution of a linear or nonlinear system), they are called "unknowns" or "degrees of freedom". The second meaning of the term can be explained as follows: A mathematical description of finite element problem is often to say that we are looking for a finite dimensional function \(u_h \in V_h\) that satisfies some set of equations (e.g. \(a(u_h,\varphi_h)=(f,\varphi_h)\) for all test functions \(\varphi_h\in V_h\)). In other words, all we say here that the solution needs to lie in some space \(V_h\). However, to actually solve this problem on a computer we need to choose a basis of this space; this is the set of shape functions \(\varphi_j(\mathbf{x})\) we have used above in the expansion of \(u_h(\mathbf x)\) with coefficients \(U_j\). There are of course many bases of the space \(V_h\), but we will specifically choose the one that is described by the finite element functions that are traditionally defined locally on the cells of the mesh. Describing "degrees of freedom" in this context requires us to simply enumerate the basis functions of the space \(V_h\). For \(Q_1\) elements this means simply enumerating the vertices of the mesh in some way, but for higher elements one also has to enumerate the shape functions that are associated with edges, faces, or cell interiors of the mesh. The class that provides this enumeration of the basis functions of \(V_h\) is called DoFHandler. The process of enumerating degrees of freedom is referred to as "distributing DoFs" in deal.II.

The flag is necessary to make cases like this work: assume we have a one-dimensional mesh embedded in a two-dimensional space,

One dimensional mesh in two dimensions

In one dimensional meshes in one dimensional space, we can always make sure that the location of the left vertex of a cell has a smaller value than the location of the right vertex. However, if we embed a mesh in a higher dimensional space, we can no longer do this. For example, the cells in the mesh above may be described by the following vertex sets: (0,1), (1,2), (3,2), (4,3), (4,5). (As a side remark, note that here we have vertices – e.g. vertex 2 – that are the right end points of more than one cell.) If we define the normal to each cell as that unit vector that is right perpendicular to the vector that connects the first to the second vertex of the line, then we would end up with the following picture:

Normal vectors

In other words, this one-dimensional manifold is not oriented. We could in principle revert the order of vertices when creating such a mesh (though there are good reasons not to do so, for example because this mesh may have resulted from extracting the surface mesh of a two dimensional mesh, and we want to preserve the order of vertices of each line segment because they currently match the order of vertices of the faces of the 2d cells). An alternative strategy, chosen in deal.II, is to simply associate with each cell whether the normal should be the left or right normal to the cell. (The default is right normals.) In the example above, the flags for the five cells would be true, true, false, false, true. Multiplying the right normal with plus or minus one, depending on the value of the flag on each cell, yields a set of normal vectors that orient the manifold.

Similar issues happen with two-dimensional meshes in three space dimensions. We note that it would not be possible to find consistent direction flags if the two-dimensional manifold is not orientable; such manifolds are not currently supported by deal.II.

Distorted cells

A distorted cell is a cell for which the mapping from the reference cell to real cell has a Jacobian whose determinant is non-positive somewhere in the cell. Typically, we only check the sign of this determinant at the vertices of the cell. The function GeometryInfo::alternating_form_at_vertices computes these determinants at the vertices.

By way of example, if all of the determinants are of roughly equal value and on the order of \(h^\text{dim}\) then the cell is well-shaped. For example, a square cell or face has determinants equal to \(h^\text{dim}\) whereas a strongly sheared parallelogram has a determinant much smaller. Similarly, a cell with very unequal edge lengths will have widely varying determinants. Conversely, a pinched cell in which the location of two or more vertices is collapsed to a single point has a zero determinant at this location. Finally, an inverted or twisted cell in which the location of two vertices is out of order will have negative determinants.

The following two images show a well-formed, a pinched, and a twisted cell for both 2d and 3d:

A well-formed, a pinched, and a twisted cell in 2d.

A well-formed, a pinched, and a twisted cell in 3d.

Distorted cells can appear in two different ways: The original coarse mesh can already contain such cells, or they can be created as the result of mesh refinement if the boundary description in use is sufficiently irregular.

If the appropriate flag is given upon creation of a triangulation, the function Triangulation::create_triangulation, which is called by the various functions in GridGenerator and GridIn (but can also be called from user code, see step-14, will signal the creation of coarse meshes with distorted cells by throwing an exception of type Triangulation::DistortedCellList. There are legitimate cases for creating meshes with distorted cells (in particular collapsed/pinched cells) if you don't intend to assemble anything on these cells. For example, consider a case where one would like to simulate the behavior of an elastic material with a fluid-filled crack such as an oil reservoir. If the pressure becomes too large, the crack is closed – and the cells that discretize the crack volume are collapsed to zero volume. As long as you don't integrate over these cells to simulate the behavior of the fluid (of which there isn't any if the crack has zero volume), such meshes are perfectly legitimate. As a consequence, Triangulation::create_triangulation does not simply abort the program, but throws an exception that contains a list of cells that are distorted; this exception can be caught and, if you believe that you can ignore this condition, you can react by doing nothing with the caught exception.

The second case in which distorted cells can appear is through mesh refinement when we have curved boundaries. Consider, for example, the following case where the dashed line shows the exact boundary that the lower edge of the cell is supposed to approximate (let's assume for simplicity that the left, top and right edges are interior edges and therefore will be considered as straight; in fact, for this particular case in 2d where only one side of a cell is at the boundary we have special code that avoids the situation depicted, but you will get the general idea of the problem that holds in 3d or if more than one side of the cell is at the boundary):

One cell with an edge approximating a curved boundary

Now, if this cell is refined, we first split all edges and place new mid-points on them. For the left, top and right edge, this is trivial: because they are considered straight, we just take the point in the middle between the two vertices. For the lower edge, the Triangulation class asks the Boundary object associated with this boundary (and in particular the Boundary::new_point_on_line function) where the new point should lie. The four old vertices and the four new points are shown here:

Cell after edge refinement

The last step is to compute the location of the new point in the interior of the cell. By default, it is chosen as the average location (arithmetic mean of the coordinates) of these 8 points (in 3d, the 26 surrounding points have different weights, but the idea is the same):

Cell after edge refinement

The problem with that is, of course, that the bottom two child cells are twisted, whereas the top two children are well-shaped. While such meshes can happen with sufficiently irregular boundary descriptions (and if the coarse mesh is entirely inadequate to resolve the complexity of the boundary), the Triangulation class does not know what to do in such situations unless one attaches an appropriate manifold object to the cells in question (see the documentation module on manifolds). Consequently, absent such a manifold description or if the manifold description does not provide a sufficient description of the geometry, the Triangulation::execute_coarsening_and_refinement function does create such meshes, but it keeps a list of cells whose children are distorted. If this list is non-empty at the end of a refinement step, it will throw an exception of type Triangulation::DistortedCellList that contains those cells that have distorted children. The caller of Triangulation::execute_coarsening_and_refinement can then decide what to do with this situation.

One way to deal with this problem is to use the GridTools::fix_up_distorted_child_cells function that attempts to fix up exactly these cells if possible by moving around the node at the center of the cell.

Note that the Triangulation class does not test for the presence of distorted cells by default, since the determination whether a cell is distorted or not is not a cheap operation. If you want a Triangulation object to test for distortion of cells, you need to specify this upon creation of the object by passing the appropriate flag.

Distributed computing paper

The "distributed computing paper" is a paper by W. Bangerth, C. Burstedde, T. Heister and M. Kronbichler titled "Algorithms and Data
Structures for Massively Parallel Generic Finite Element Codes" that describes the implementation of parallel distributed computing in deal.II, i.e. computations where not only the linear system is split onto different machines as in, for example, step-17, but also the Triangulation and DoFHandler objects. In essence, it is a guide to the parallel::distributed namespace and the techniques used in step-40.

For massively parallel computations, deal.II builds on the p4est library. If you use this functionality, please also cite the p4est paper listed at their website.

Face orientation

In a triangulation, the normal vector to a face can be deduced from the face orientation by applying the right hand side rule (x,y -> normal). We note, that in the standard orientation of faces in 2d, faces 0 and 2 have normals that point into the cell, and faces 1 and 3 have normals pointing outward. In 3d, faces 0, 2, and 4 have normals that point into the cell, while the normals of faces 1, 3, and 5 point outward. This information, again, can be queried from GeometryInfo<dim>::unit_normal_orientation.

However, it turns out that a significant number of 3d meshes cannot satisfy this convention. This is due to the fact that the face convention for one cell already implies something for the neighbor, since they share a common face and fixing it for the first cell also fixes the normal vectors of the opposite faces of both cells. It is easy to construct cases of loops of cells for which this leads to cases where we cannot find orientations for all faces that are consistent with this convention.

For this reason, above convention is only what we call the standard orientation. deal.II actually allows faces in 3d to have either the standard direction, or its opposite, in which case the lines that make up a cell would have reverted orders, and the normal vector would have the opposite direction. You can ask a cell whether a given face has standard orientation by calling cell->face_orientation(face_no): if the result is true, then the face has standard orientation, otherwise its normal vector is pointing the other direction. There are not very many places in application programs where you need this information actually, but a few places in the library make use of this. Note that in 2d, the result is always true. However, while every face in 2d is always in standard orientation, you can sometimes specify something to assume that this is not so; an example is the function DoFTools::make_periodicity_constraints().

There are two other flags that describe the orientation of a face: face_flip and face_rotation. Some documentation for these exists in the GeometryInfo class. An example of their use in user code is given in the DoFTools::make_periodicity_constraints function.

Generalized support points

"Generalized support points" are, as the name suggests, a generalization of support points. The latter are used to describe that a finite element simply interpolates values at individual points (the "support points"). If we call these points \(\hat{\mathbf{x}}_i\) (where the hat indicates that these points are defined on the reference cell \(\hat{K}\)), then one typically defines shape functions \(\varphi_j(\mathbf{x})\) in such a way that the nodal functionals \(\Psi_i[\cdot]\) simply evaluate the function at the support point, i.e., that \(\Psi_i[\varphi]=\varphi(\hat{\mathbf{x}}_i)\), and the basis is chosen so that \(\Psi_i[\varphi_j]=\delta_{ij}\) where \(\delta_{ij}\) is the Kronecker delta function. This leads to the common Lagrange elements.

(In the vector valued case, the only other piece of information besides the support points \(\hat{\mathbf{x}}_i\) that one needs to provide is the vector component \(c(i)\) the \(i\)th node functional corresponds, so that \(\Psi_i[\varphi]=\varphi(\hat{\mathbf{x}}_i)_{c(i)}\).)

On the other hand, there are other kinds of elements that are not defined this way. For example, for the lowest order Raviart-Thomas element (see the FE_RaviartThomas class), the node functional evaluates not individual components of a vector-valued finite element function with dim components, but the normal component of this vector: \(\Psi_i[\varphi] = \varphi(\hat{\mathbf{x}}_i) \cdot \mathbf{n}_i \), where the \(\mathbf{n}_i\) are the normal vectors to the face of the cell on which \(\hat{\mathbf{x}}_i\) is located. In other words, the node functional is a linear combination of the components of \(\varphi\) when evaluated at \(\hat{\mathbf{x}}_i\). Similar things happen for the BDM, ABF, and Nedelec elements (see the FE_BDM, FE_ABF, FE_Nedelec classes).

In these cases, the element does not have support points because it is not purely interpolatory; however, some kind of interpolation is still involved when defining shape functions as the node functionals still require point evaluations at special points \(\hat{\mathbf{x}}_i\). In these cases, we call the points generalized support points.

Finally, there are elements that still do not fit into this scheme. For example, some hierarchical basis functions (see, for example the FE_Q_Hierarchical element) are defined so that the node functionals are moments of finite element functions, \(\Psi_i[\varphi] = \int_{\hat{K}} \varphi(\hat{\mathbf{x}}) {\hat{x}_1}^{p_1(i)} {\hat{x}_2}^{p_2(i)} \) in 2d, and similarly for 3d, where the \(p_d(i)\) are the order of the moment described by shape function \(i\). Some other elements use moments over edges or faces. In all of these cases, node functionals are not defined through interpolation at all, and these elements then have neither support points, nor generalized support points.

Ghost cells

If a mesh is distributed across multiple MPI processes using the parallel::distributed::Triangulation class, each processor stores only the cells it owns, one layer of adjacent cells that are owned by other processors, all coarse level cells, and all cells that are necessary to maintain the invariant that adjacent cells must differ by at most one refinement level. The cells stored on each process that are not owned by this process but that are adjacent to the ones owned by this process are called "ghost cells", and for these cells the predicate cell->is_ghost() returns true. Ghost cells are guaranteed to exist in the globally distributed mesh, i.e. these cells are actually owned by another process and are not further refined there. See the Distributed Computing paper for more information.

The layer of ghost cells consists of all cells that are face, edge, or vertex neighbors of any locally owned cell and that are not locally owned themselves. In other word, the ghost cells completely enclose the subdomain of locally owned cells (with the exception of the boundary of the domain, of course).

In parallel computations, vectors come in two general kinds: without and with ghost elements. Vectors without ghost elements uniquely partition the vector elements between processors: each vector entry has exactly one processor that owns it, and this is the only processor that stores the value of this entry. In other words, if processor zero stores elements 0...49 of a vector and processor one stores elements 50...99, then processor one is out of luck accessing element 42 of this vector: it is not stored here and the value can not be assessed. This will result in an assertion.

On the other hand, there are many situations where one needs to know vector elements that aren't locally owned, for example to evaluate the solution on a locally owned cell (see GlossLocallyOwnedCell) for which one of the degrees of freedom is at an interface to a cell that we do not own locally (which, in this case must then be a ghost cell) and for which the neighboring cell may be the owner – in other words, the degree of freedom is not a locally owned but instead only a locally active DoFs. The values of such degrees of freedom are typically stored on the machine that owns the degree of freedom and, consequently, would not be accessible on the current machine.

Because one often needs these values anyway, there is a second kind of vector, often called "ghosted vector". Ghosted vectors store some elements on each processor for which that processor is not the owner. For such vectors, you can read those elements that the processor you are currently on stores but you cannot write into them because to make this work would require propagating the new value to all other processors that have a copy of this value (the list of such processors may be something which the current processor does not know and has no way of finding out efficiently). Since you cannot write into ghosted vectors, the only way to initialize such a vector is by assignment from a non-ghosted vector. This implies having to import those elements we locally want to store from other processors.

The way ghosted vectors are actually stored is different between the various implementations of parallel vectors. For PETSc (and the corresponding PETScWrappers::MPI::Vector class), ghosted vectors store the same elements as non-ghosted ones would, plus some additional elements that are owned by other processors. In other words, for each element there is a clear owner among all of the processors and those elements that the current processor stores but does not own (i.e., the "ghost elements") are simply mirror images of a master value somewhere else – thus, the name "ghost". This is also the case for the parallel::distributed::Vector class.

On the other hand, in Trilinos (and consequently in TrilinosWrappers::MPI::Vector), a ghosted vector is simply a view of the parallel vector where the element distributions overlap. The 'ghosted' Trilinos vector in itself has no idea of which entries are ghosted and which are locally owned. In fact, a ghosted vector may not even store all of the elements a non-ghosted vector would store on the current processor. Consequently, for Trilinos vectors, there is no notion of an 'owner' of vector elements in the way we have it in the the non-ghost case view (or in the PETSc case) and the name "ghost element" may be misleading since in this view, every element we have available locally may or may not be stored somewhere else as well, but even if it is, the local element is not a mirror value of a master location as there is no owner of each element.

The "hp paper" is a paper by W. Bangerth and O. Kayser-Herold, titled "Data Structures and Requirements for hp Finite Element Software", that describes many of the algorithms and data structures used in the implementation of the hp framework of deal.II. In particular, it summarizes many of the tricky points that have to be considered for hp finite elements using continuous elements.

The numerical examples shown in that paper are generated with a slightly modified version of step-27. The main difference to that tutorial program is that various operations in the program were timed for the paper to compare different options and show that \(hp\) methods are really not all that expensive.

Interpolation with finite elements

The purpose of interpolation with finite elements is computing a vector of coefficients representing a finite element function, such that the node values of the original function and the finite element function coincide. Therefore, the interpolation process consists of evaluating all node functionalsNi for the given function f and store the result as entry i in the coefficient vector.

Each processor in a parallel computation has a triangulation covering the entire domain that consists of cells that are locally owned, of ghost cells and of artificial cells.

Locally owned degrees of freedom

This concept identifies a subset of all degrees of freedom when using distributed meshes, see the Parallel computing with multiple processors using distributed memory module. Locally owned degrees of freedom live on locally owned cells. Since degrees of freedom are owned by only one processor, degrees of freedom on interfaces between cells owned by different processors may be owned by one or the other, so not all degrees of freedom on a locally owned cell are also locally owned degrees of freedom.

This concept identifies a subset of all degrees of freedom when using distributed meshes, see the Parallel computing with multiple processors using distributed memory module. Locally active degrees of freedom are those that live on locally owned cells. Degrees of freedom on interfaces between cells owned by different processors therefore belong to the set of locally active degrees of freedom for more than one processor.

Every object that makes up a Triangulation (cells, faces, edges, etc.), is associated with a unique number (of type types::manifold_id) that is used to identify which manifold object is responsible to generate new points when the mesh is refined.

By default, all manifold indicators of a mesh are set to numbers::invalid_manifold_id. A typical piece of code that sets the manifold indicator on a object to something else would look like this, here setting the manifold indicator to 42 for all cells whose center has an \(x\) component less than zero:

The code above only sets the manifold indicators of a particular part of the Triangulation, but it does not by itself change the way the Triangulation class treats this object for the purposes of mesh refinement. For this, you need to call Triangulation::set_manifold() to associate a manifold object with a particular manifold indicator. This allows the Triangulation objects to use a different method of finding new points on cells, faces or edges to be refined; the default is to use a FlatManifold object for all faces and edges.

Each cell of a triangulation has associated with it a property called "material id". It is commonly used in problems with heterogeneous coefficients to identify which part of the domain a cell is in and, consequently, which value the coefficient should have on this particular cell. In practice, the material id of a cell is typically used to identify which cells belong to a particular part of the domain, e.g., when you have different materials (steel, concrete, wood) that are all part of the same domain. One would then usually query the material id associated with a cell during assembly of the bilinear form, and use it to determine (e.g., by table lookup, or a sequence of if-else statements) what the correct material coefficients would be for that cell.

This material_id may be set upon construction of a triangulation (through the CellData data structure), or later through use of cell iterators. For a typical use of this functionality, see the step-28 tutorial program. The functions of the GridGenerator namespace typically set the material ID of all cells to zero. When reading a triangulation through the GridIn class, different input file formats have different conventions, but typically either explicitly specify the material id, or if they don't, then GridIn simply sets them to zero. Because the material of a cell is intended to pertain to a particular region of the domain, material ids are inherited by child cells from their parent upon mesh refinement.

In the language of the Message Passing Interface (MPI), a communicator can be thought of as a mail system that allows sending messages to other members of the mail system. Within each communicator, each process has a rank (the equivalent of a house number) that allows to identify senders and receivers of messages. It is not possible to send messages via a communicator to receivers that are not part of this communicator/mail service.

When starting a parallel program via a command line call such as

mpirun -np 32 ./step-17

(or the equivalent used in the batch submission system used on your cluster) the MPI system starts 32 copies of the step-17 executable. Each of these has access to the MPI_COMM_WORLD communicator that then consists of all 32 processors, each with its own rank. A subset of processes within this MPI universe can later agree to create other communicators that allow communication between only a subset of processes.

MPI Process

When running parallel jobs on distributed memory machines, one almost always uses MPI. There, a command line call such as

mpirun -np 32 ./step-17

(or the equivalent used in the batch submission system used on your cluster) starts 32 copies of the step-17 executable. Some of these may actually run on the same machine, but in general they will be running on different machines that do not have direct access to each other's memory space.

In the language of the Message Passing Interface (MPI), each of these copies of the same executable running on (possibly different) machines are called processes. The collection of all processes running in parallel is called the "MPI Universe" and is identified by the MPI communicatorMPI_COMM_WORLD.

Each process has immediate access only to the objects in its own memory space. A process can not read from or write into the memory of other processes. As a consequence, the only way by which processes can communicate is by sending each other messages. That said (and as explained in the introduction to step-17), one typically calls higher level MPI functions in which all processes that are part of a communicator participate. An example would be computing the sum over a set of integers where each process provides one term of the sum.

MPI Rank

In the language of the Message Passing Interface (MPI), the rank of an MPI process is the number this process carries within the set MPI_COMM_WORLD of all processes currently running as one parallel job. More correctly, it is the number within an MPI communicator that groups together a subset of all processes with one parallel job (where MPI_COMM_WORLD simply denotes the complete set of processes).

Within each communicator, each process has a unique rank, distinct from the all other processes' ranks, that allows identifying one recipient or sender in MPI communication calls. Each process, running on one processor, can inquire about its own rank within a communicator by calling Utilities::MPI::this_mpi_process(). The total number of processes participating in a communicator (i.e., the size of the communicator) can be obtained by calling Utilities::MPI::n_mpi_processes().

Multigrid paper

The "multigrid paper" is a paper by B. Janssen and G. Kanschat, titled "Adaptive Multilevel Methods with Local Smoothing for H1- and Hcurl-Conforming High Order Finite Element Methods", that describes many of the algorithms and data structures used in the implementation of the multigrid framework of deal.II. It underlies the implementation of the classes that are used in step-16 for multigrid methods.

The full reference for this paper is as follows:

1 @article{janssen2011adaptive,

2 title= {Adaptive Multilevel Methods with Local Smoothing for H^1- and H^{curl}-Conforming High Order Finite Element Methods},

It is customary to define a finite element as a triple \((K,P,\Psi)\) where

\(K\) is the cell, where in deal.II this is always a line segment, quadrilateral, or hexahedron;

\(P\) is a finite-dimensional space, e.g., a polynomial space mapped from the reference cell to \(K\);

\(\Psi\) is a set of "node functionals", i.e., functionals \(\Psi_i : P \rightarrow {\mathbb R}\). The dimension of \(P\) must be equal to the number of node functionals. With this definition, we can define a basis of the local function space, i.e., a set of "shape functions" \(\varphi_j\in P\), by requiring that \(\Psi_i(\varphi_j) = \delta_{ij}\), where \(\delta\) is the Kronecker delta.

This definition of what a finite element is has several advantages, concerning analysis as well as implementation. For the analysis, it means that conformity with certain spaces (FiniteElementData::Conformity), e.g. continuity, is up to the node functionals. In deal.II, it helps simplifying the implementation of more complex elements like FE_RaviartThomas considerably.

Examples for node functionals are values in support points and moments with respect to Legendre polynomials. Examples:

Gauss points on edges(faces) and anisotropic Gauss points in the interior

The construction of finite elements as outlined above allows writing code that describes a finite element simply by providing a polynomial space (without having to give it any particular basis – whatever is convenient is entirely sufficient) and the nodal functionals. This is used, for example in the FiniteElement::convert_generalized_support_point_values_to_dof_values() function.

Parallel scaling

When we say that a parallel program "scales", what we mean is that the program does not become unduly slow (or takes unduly much memory) if we make the problem it solves larger, and that run time and memory consumption decrease proportionally if we keep the problem size the same but increase the number of processors (or cores) that work on it.

More specifically, think of a problem whose size is given by a number \(N\) (which could be the number of cells, the number of unknowns, or some other indicative quantity such as the number of CPU cycles necessary to solve it) and for which you have \(P\) processors available for solution. In an ideal world, the program would then require a run time of \({\cal O}(N/P)\), and this would imply that we could reduce the run time to any desired value by just providing more processors. Likewise, for a program to be scalable, its overall memory consumption needs to be \({\cal O}(N)\) and on each involved process needs to be \({\cal O}(N/P)\), again implying that we can fit any problem into the fixed amount of memory computers attach to each processor, by just providing sufficiently many processors.

For practical assessments of scalability, we often distinguish between "strong" and "weak" scalability. These assess asymptotic statements such as \({\cal O}(N/P)\) run time in the limits \(N\rightarrow \infty\) and/or \(P\rightarrow \infty\). Specifically, when we say that a program is "strongly scalable", we mean that if we have a problem of fixed size \(N\), then we can reduce the run time and memory consumption (on every processor) inversely proportional to \(P\) by just throwing more processors at the problem. In particular, strong scalability implies that if we provide twice as many processors, then run time and memory consumption on every process will be reduced by a factor of two. In other words, we can solve the same problem faster and faster by providing more and more processors.

Conversely, "weak scalability" means that if we increase the problem size \(N\) by a fixed factor, and increase the number of processors \(P\) available to solve the problem by the same factor, then the overall run time (and the memory consumption on every processor) remains the same. In other words, we can solve larger and larger problems within the same amount of wallclock time by providing more and more processors.

No program is truly scalable in this theoretical sense. Rather, all programs cease to scale once either \(N\) or \(P\) grows larger than certain limits. We therefore often say things such as "the program scales up to
4,000 cores", or "the program scales up to 100,000,000 unknowns". There are a number of reasons why programs cannot scale without limit; these can all be illustrated by just looking at the (relatively simple) step-17 tutorial program:

Sequential sections: Many programs have sections of code that either cannot or are not parallelized, i.e., where one processor has to do a certain, fixed amount of work that does not decrease just because there are a total of \(P\) processors around. In step-17, this is the case when generating graphical output: one processor creates the graphical output for the entire problem, i.e., it needs to do \({\cal O}(N)\) work. That means that this function has a run time of \({\cal O}(N)\), regardless of \(P\), and consequently the overall program will not be able to achieve \({\cal O}(N/P)\) run time but have a run time that can be described as \(c_1N/P + c_2N\) where the first term comes from scalable operations such as assembling the linear system, and the latter from generating graphical output on process 0. If \(c_2\) is sufficiently small, then the program might look like it scales strongly for small numbers of processors, but eventually strong scalability will cease. In addition, the program can not scale weakly either because increasing the size \(N\) of the problem while increasing the number of processors \(P\) at the same rate does not keep the run time of this one function constant.

Duplicated data structures: In step-17, each processor stores the entire mesh. That is, each processor has to store a data structure of size \({\cal O}(N)\), regardless of \(P\). Eventually, if we make the problem size large enough, this will overflow each processor's memory space even if we increase the number of processors. It is thus clear that such a replicated data structure prevents a program from scaling weakly. But it also prevents it from scaling strongly because in order to create an object of size \({\cal O}(N)\), one has to at the very least write into \({\cal O}(N)\) memory locations, costing \({\cal O}(N)\) in CPU time. Consequently, there is a component of the overall algorithm that does not behave as \({\cal O}(N/P)\) if we provide more and more processors.

Communication: If, to pick just one example, you want to compute the \(l_2\) norm of a vector of which all MPI processes store a few entries, then every process needs to compute the sum of squares of its own entries (which will require \({\cal O}(N/P)\) time, and consequently scale perfectly), but then every process needs to send their partial sum to one process that adds them all up and takes the square root. In the very best case, sending a message that contains a single number takes a constant amount of time, regardless of the overall number of processes. Thus, again, every program that does communication cannot scale strongly because there are parts of the program whose CPU time requirements do not decrease with the number of processors \(P\) you allocate for a fixed size \(N\). In reality, the situation is actually even worse: the more processes are participating in a communication step, the longer it will generally take, for example because the one process that has to add everyone's contributions has to add everything up, requiring \({\cal O}(P)\) time. In other words, CPU time increases with the number of processes, therefore not only preventing a program from scaling strongly, but also from scaling weakly. (In reality, MPI libraries do not implement \(l_2\) norms by sending every message to one process that then adds everything up; rather, they do pairwise reductions on a tree that doesn't grow the run time as \({\cal O}(P)\) but as \({\cal O}(\log_2 P)\), at the expense of more messages sent around. Be that as it may, the fundamental point is that as you add more processors, the run time will grow with \(P\) regardless of the way the operation is actually implemented, and it can therefore not scale.)

These, and other reasons that prevent programs from scaling perfectly can be summarized in Amdahl's law that states that if a fraction \(\alpha\) of a program's overall work \(W\) can be parallelized, i.e., it can be run in \({\cal O}(\alpha W/P)\) time, and a fraction \(1-\alpha\) of the program's work can not be parallelized (i.e., it consists either of work that only one process can do, such as generating graphical output in step-17; or that every process has to execute in a replicated way, such as sending a message with a local contribution to a dedicated process for accumulation), then the overall run time of the program will be

If \(\alpha<1\), which it is for all practically existing programs, then \(S\rightarrow \frac{1}{1-\alpha}\) as \(P\rightarrow \infty\), implying that there is a point where it does not pay off in any significant way any more to throw more processors at the problem.

In practice, what matters is up to which problem size or up to which number of processes or down to which size of local problems \({\cal}(N/P)\) a program scales. For deal.II, experience shows that on most clusters with a reasonable fast network, one can solve problems up to a few billion unknowns, up to a few thousand processors, and down to somewhere between 40,000 and 100,000 unknowns per process. The last number is the most relevant: if you have a problem with, say, \(10^8\) unknowns, then it makes sense to solve it on 1000-2500 processors since the number of degrees of freedom each process handles remains at more than 40,000. Consequently, there is enough work every process has to do so that the \({\cal O}(1)\) time for communication does not dominate. But it doesn't make sense to solve such a problem with 10,000 or 100,000 processors, since each of these processor's local problem becomes so small that they spend most of their time waiting for communication, rather than doing work on their part of the work.

A finite element (described by its shape functions) is primitive if there is a unique relation from shape function number to vector component. What this means is that each shape function of a vector-valued element has exactly one nonzero component if an element is primitive. This includes, in particular, all scalar elements as well as vector-valued elements assembled via the FESystem class from other primitive (for example scalar) elements as shown in step-8, step-29, step-22 and several others. On the other hand, the FE_RaviartThomas class used in step-20 and step-21, or the FE_Nedelec class provide non-primitive finite elements because there, each vector-value shape function may have several non-zero components.

Reference cell

The hypercube [0,1]dim, on which all parametric finite element shape functions are defined. Many properties of the reference cell are described by the GeometryInfo class.

Serialization

The term "serialization" refers to the process of writing the state of an object to a stream and later retrieve it again. A typical use case is to save the state of a program to disk for possible later resurrection, often in the context of checkpoint/restart strategies for long running computations or on computers that aren't very reliable (e.g. on very large clusters where individual nodes occasionally fail and then bring down an entire MPI job). In either case, one wants to occasionally save the state of the program so that, upon failure, one can restart it at that point rather than having to run it again from the beginning.

deal.II implements serialization facilities by implementing the necessary interfaces for the BOOST serialization library. See there for examples on how to save and restore objects.

Shape functions

The restriction of the finite element basis functions to a single grid cell.

Subdomain id

Each cell of a triangulation has associated with it a property called the "subdomain id" that can be queried using a call like cell->subdomain_id() and that can be set for example by using cell->set_subdomain_id(13). (These calls resolve to CellAccessor::subdomain_id() and CellAccessor::set_subdomain_id(), respectively.) While in principle this property can be used in any way application programs deem useful (it is simply an integer associated with each cell that can indicate whatever you want), at least for programs that run in parallel it usually denotes the MPI rank of the processor that "owns" this cell.

For programs that are parallelized based on MPI but where each processor stores the entire triangulation (as in, for example, step-17 and step-18, but not in step-40), subdomain ids are assigned to cells by partitioning a mesh, and each MPI process then only works on those cells it "owns", i.e., that belong to a subdomain the processor owns (traditionally, this is the case for the subdomain id whose numerical value coincides with the rank of the MPI process within the MPI communicator). Partitioning is typically done using the GridTools::partition() function, but any other method can also be used to do this. (Alternatively, the parallel::shared::Triangulation class can partition the mesh automatically using a similar approach.)

On the other hand, for programs that are parallelized using MPI but where meshes are held distributed across several processors using the parallel::distributed::Triangulation class, the subdomain id of cells is tied to the processor that owns the cell. In other words, querying the subdomain id of a cell tells you if the cell is owned by the current processor (i.e. if cell->subdomain_id() == triangulation.parallel::distributed::Triangulation::locally_owned_subdomain()) or by another processor. In the parallel distributed case, subdomain ids are only assigned to cells that the current processor owns as well as the immediately adjacent ghost cells. Cells further away are held on each processor to ensure that every MPI process has access to the full coarse grid as well as to ensure the invariant that neighboring cells differ by at most one refinement level. These cells are called "artificial" (see here) and have the special subdomain id value types::artificial_subdomain_id.

In addition to regular subdomain ids, there is a second, closely related set of flags that are associated with each cell: "level subdomain ids." These exist not only for active cells but, in fact, for every cell in a mesh hierarchy. Their meaning is entirely analogous to the regular subdomain ids, but they are read and written by the CellAccessor::level_subdomain_id() and CellAccessor::set_level_subdomain_id() functions.

Support points

Support points are by definition those points \(p_i\), such that for the shape functions \(v_j\) holds \(v_j(p_i) = \delta_{ij}\). Therefore, a finite element interpolation can be defined uniquely by the values in the support points.

When vectors and matrices are grouped into blocks by component, it is often desirable to collect several of the original components into a single one. This could be for instance, grouping the velocities of a Stokes system into a single block.

These are the support points on the reference cell, defined in FiniteElement. For example, the usual Q1 element in 1d has support points at x=0 and x=1 (and similarly, in higher dimensions at the vertices of the unit square or cube). On the other hand, higher order Lagrangian elements have unit support points also in the interior of the unit line, square, or cube.

User flags

A triangulation offers one bit per line, quad, etc for user flags. This field can be accessed as all other data using iterators, using the syntax

Typically, this user flag is used if an algorithm walks over all cells and needs information whether another cell, e.g. a neighbor, has already been processed. Similarly, it can be used to flag faces, quads or lines at the boundary for which some operation has already been performed. The latter is often useful since a loop such as

encounters some boundary lines more than once. Consequently, one would set the user flag of the line in the body of the loop, and only enter the body if the user flag had not previously been set. There are a number of additional functions that can be accessed through the iterator interface; see the TriaAccessor class for more information. Note that there are no user flags that can be associated with vertices; however, since vertices are numbered consecutively, this can easily be emulated in user code using a vector of bools.

As for the refinement and coarsening flags, there exist two versions of these functions, one which reads/writes from a stream and one which does so from a vector<bool>. The latter is used to store flags temporarily, while the first is used to store them in a file.

It is good practice to clear the user flags using the Triangulation::clear_user_flags() function before usage, since it is often necessary to use the flags in more than one function. If the flags may be in use at the time a function that needs them is called, then this function should save and restore the flags as described above.

Note

If more information than just a single boolean flag needs to be stored with a cell, line, or face, then see about user data.

User pointers and user indices

Just like the user flags, the Triangulation class offers a field for each line, quad and hex in which to store more descriptive data than just a single boolean flag. This is called "user data" and the data that can be stored in it is either a single unsigned integer or a void pointer. Both are typically used to index into a bigger array that contains more detailed data an application wants to attach to a mesh entity.

User pointers and user indices are stored in the same place. In order to avoid unwanted conversions, Triangulation checks which one of them is in use and does not allow access to the other one, until Triangulation::clear_user_data() has been called.

The usual warning about the missing type safety of void pointers are obviously in place here; responsibility for correctness of types etc lies entirely with the user of the pointer.

WorkStream paper

The "WorkStream paper" is a paper by B. Turcksin, M. Kronbichler and W. Bangerth that discusses the design and implementation of WorkStream. WorkStream is, at its core, a design pattern, i.e., something that is used over and over in finite element codes and that can, consequently, be implemented generically. In particular, the paper lays out the motivation for this pattern and then proposes different ways of implementing it. It also compares the performance of different implementations.

The "Z order" of cells describes an order in which cells are traversed.

By default, if you write a loop over all cells in deal.II, the cells will be traversed in an order where coarser cells (i.e., cells that were obtained from coarse mesh cells with fewer refinement steps) come before cells that are finer (i.e., cells that were obtained with more refinement steps). Within each refinement level, cells are traversed in an order that has something to do with the order in which they were created; in essence, however, this order is best of thought of as "unspecified": you will visit each cell on a given refinement level exactly once, in some order, but you should not make any assumptions about this order.

Because the order in which cells are created factors into the order of cells, it can happen that the order in which you traverse cells is different for two identical meshes. For example, think of a 1d (coarse) mesh with two cells: If you first refine the first of these cells and then the other, then you will traverse the four cells on refinement level 1 in a different order than if you had first refined the second coarse cell and then the first coarse cell.

This order is entirely practical for almost all applications because in most cases, it does not actually matter in which order one traverses cells. Furthermore, it allows using data structures that lead to particularly low cache miss frequencies and are therefore efficient for high performance computing applications.

On the other hand, there are cases where one would want to traverse cells in a particular, specified and reproducible order that only depends on the mesh itself, not its creation history or any other seemingly arbitrary design decisions. The "Z order" is one way to achieve this goal.

To explain the concept of the Z order, consider the following sequence of meshes (with each cell numbered using the "level.index" notation, where "level" is the number of refinements necessary to get from a coarse mesh cell to a particular cell, and "index" the index of this cell within a particular refinement level):

A coarse mesh

The mesh after one refinement cycle

The mesh after two refinement cycles

The mesh after three refinement cycles

Note how the cells on level 2 are ordered in the order in which they were created. (Which is not always the case: if cells had been removed in between, then newly created cells would have filled in the holes so created.)

The "natural" order in which deal.II traverses cells would then be 0.0 -> 1.0 -> 1.1 -> 1.2 -> 1.3 -> 2.0 -> 2.1 -> 2.2 -> 2.3 -> 2.4 -> 2.5 -> 2.6 -> 2.7. (If you want to traverse only over the active cells, then omit all cells from this list that have children.) This can be thought of as the "lexicographic" order on the pairs of numbers "level.index", but because the index within each level is not well defined, this is not a particularly useful notion. Alternatively, one can also think of it as one possible breadth-first traversal of the tree that corresponds to this mesh and that represents the parent-child relationship between cells:

The tree that corresponds to the mesh after three refinement cycles

On the other hand, the Z order corresponds to a particular depth-first traversal of the tree. Namely: start with a cell, and if it has children then iterate over these cell's children; this rule is recursively applied as long as a child has childen.

For the given mesh above, this yields the following order: 0.0 -> 1.0 -> 2.4 -> 2.5 -> 2.6 -> 2.7 -> 1.1 -> 1.2 -> 1.3 -> 1.4 -> 2.0 -> 2.1 -> 2.2 -> 2.3. (Again, if you only care about active cells, then remove 0.0, 1.0, and 1.3 from this list.) Because the order of children of a cell is well defined (as opposed to the order of cells within each level), this "hierarchical" traversal makes sense and is, in particular, independent of the history of a triangulation.

Finally, as an explanation of the term "Z" order: if you draw a line through all cells in the order in which they appear in this hierarchical fashion, then it will look like a left-right inverted Z on each refined cell. Indeed, the curve so defined can be thought of a space-filling curve and is also sometimes called "Morton ordering", see https://en.wikipedia.org/wiki/Z-order_curve .

Generated on Thu Feb 22 2018 03:34:15 for The deal.II Library by 1.8.9.1 Hosting provided by