Numerical methods for inverse radiation problems

NUMERICAL METHODS FOR INVERSE RADIATION PROBLEMS

KyleDaun

JasonPorter

Successful solution of an inverse radiation problem, be it parameter estimation or
design optimization, depends on judicious selection of the proper analysis
tool. This, in turn, depends on the underlying mathematical character of
the inverse problem. Inverse problems are categorized as either linear or
nonlinear.

1. LINEAR INVERSE PROBLEMS

Linear inverse problems involve first-kind integral equations; the most
important example in radiative transfer is the inverse boundary condition design
problem, in which the goal is to estimate the heat flux and temperature
distribution over a heater surface that satisfies prescribed (and most often
uniform) heat flux and temperature distributions over another surface called
the design surface. As mentioned above, in traditional forward analysis,
only one boundary condition, either temperature T(u) or heat flux q(u), is
specified at each location on the enclosure. The parameter u = [u,v]T denotes a
particular location on the enclosure surface. The objective is to find the unknown
radiosity distribution, qo(u), over the enclosure surfaces, which is done by
solving a Fredholm integral equation of the second kind (Daun and Hollands,
2001),

(1)

where uRuv, the kernel k(u, u') depends on the geometry between u and u',
and

(2)

if T(u) is known, or

(3)

if q(u) is known. Equation (1) is mathematically well posed, meaning that there is
only one unique distribution, qo(u), that satisfies Eq. (1) and, moreover, it is
guaranteed to exist. Equation (1) is rarely analytically tractable, so it is
usually solved numerically by assuming uniform radiosity over small but
finite area elements and then approximating the integral as a summation.
Writing this equation for each finite area results in a matrix equation, Ax
= b, where x and b contain unknown radiosity values and corresponding
boundary conditions, respectively. The A matrix is well conditioned, so x is
readily found using traditional linear algebra methods (e.g., Gauss-Seidel, LU
decomposition).

As noted above, however, the desired heat flux and temperature distributions are
usually known over the design surface, while the corresponding distributions over a
heater surface are unknown. It can be shown that specifying both T(u) and q(u) over
a surface means that the radiosity, qo(u), is also known over that surface. This known
radiosity distribution is related to the radiosity distribution over other surfaces by
a Fredholm integral equation of the first kind (Howell et al., 2000; Wing,
1991),

(4)

The left-hand side is known, in contrast to Eq. (1), making this equation
mathematically ill posed. This can be understood by writing Eq. (4) for only one
point u on the design surface so the left-hand side is a constant, and it can be shown
that an infinite set of distributions {qo(u')} can be substituted into the integral to
satisfy this constant. Writing Eq. (4) over the entire design surface instead of a single
point somewhat reduces this ambiguity, but there still exists multiple radiosity
distributions that “almost” satisfy f(u), violating Hadamard’s second criteria of
well-posed problems.

Following the same discretization procedure described above also yields a matrix
equation, Ax = b, but in this case A is ill conditioned, meaning that in addition
to the “true” solution x*, there exists a set of solutions {x} that almost
satisfies b, even though these solutions may be very different. This is best
visualized by comparing plots of the residual norms of a well-conditioned matrix
equation,

(5)

and an ill-conditioned matrix equation,

(6)

which are 2D analogues to the forward and inverse boundary condition estimation
problems. The well-conditioned matrix equation has a well-defined solution that
minimizes the residual, x*, as shown in Fig. 1a. The contour plot in Fig. 1b also has
a single, unique minimum (guaranteed for nonsingular linear problems),
but there is also a locus of solutions along the valley floor that makes the
residual norm very small. In the radiant enclosure design problem, these
solutions correspond to radiosity distributions over the heaters that produce the
desired conditions on the design surface. Moreover, the shallow residual
topography in the vicinity of x* makes the solution of an ill-conditioned matrix
equation highly sensitive to small perturbations and numerical artifacts, such as
round off error, so inverting an ill-conditioned matrix equation using the
above-mentioned traditional linear algebra techniques results in an oscillatory
solution of large amplitude that would be impossible to implement in a practical
setting.

Figure 1. Plot of the residual norm for (a) a well-conditioned and (b) an
ill-conditioned matrix equation.

To resolve this ambiguity, it is necessary to employ regularization methods that
transform the mathematically ill-posed inverse problem into a well-posed problem by
adding auxiliary information based on desired or presumed solution characteristics.
One of the earliest examples is Tikhonov regularization (Phillips, 1962; Tikhonov,
1963), which augments the ill-conditioned matrix equation with a second
homogeneous matrix equation, Lx = 0, and the regularized solution is then found by
minimizing the residual norm,

(7)

where L is the smoothing matrix and λ is a heuristically adjusted scalar, called the
regularization parameter, that controls the extent of regularization. The form of L
depends on the presumed or desired solution characteristics, i.e., setting L ≡ I
promotes a solution having a small norm, for example, while a smooth solution can
be obtained from,

(8)

When carrying out Tikhonov regularization, it is important to choose λ so that
sufficient regularization is used to obtain a useful solution, but using too much
overwhelms the ill-conditioned matrix, resulting in an inaccurate solution. Several
parametric selection strategies are described in Hansen (1998). In a topographical
sense, increasing λ reduces the problem ambiguity by steepening the residual
norm in the vicinity of its minimum, xλ*, but this also shifts xλ* away from
x*.

A second type of regularization used to solve the inverse boundary condition
design problem is truncated singular value decomposition (TSVD) (Hansen, 1998),
which is based on the singular value decomposition of A,

(9)

where U and V are orthogonal matrices, uj and vjT are the jth column vectors of
these matrices, Σ is a diagonal matrix containing the singular values arranged in
descending order, and σj = Σjj is the jth singular value. Due to the orthogonality of
U and V, Ax = b can be solved through back-substitution,

(10)

If A is ill conditioned, some of the singular values are near zero, as shown in Fig. 2,
and dividing by these values produces the large-amplitude oscillations in x. A
regularized solution, xp, is obtained by ignoring (or truncating) the summation terms corresponding to the p smallest singular values in Eq. (10), and the
solution becomes smoother as more singular values are truncated. As is the
case in Tikhonov regularization, however, the residual norm ||Axp - b|| also
increases as p increases since the truncated singular values correspond to
summation terms of increasing significance in Eq.(9), so p must be chosen with
care.

Figure 2. The singular values of an ill-conditioned matrix range over several orders
of magnitude, and some are very near zero.

A third technique used to solve the inverse boundary condition problem is the
conjugate gradient method (Hansen, 1998; Nash and Sofer, 1996), which is based
on the observation that solving Ax = b is equivalent to minimizing the
quadratic function F(x) = 1/2xTAx - bTx, since at x*, xF(x) = Ax - b = 0.
The conjugate gradient method is actually a nonlinear programming (NLP)
technique, which will be discussed in more detail shortly. All NLP methods
work by minimizing F(x) iteratively through a sequence of steps. At the kth step,

(11)

where αk is the step size and pk is the search direction. The methods differ on
how the step sizes and search directions are chosen. The conjugate gradient
method sequentially generates mutually conjugate search directions that satisfy
pk - 1TApk = 0, and the corresponding step size is then found by minimizing
F(xk + αkpk) analytically with respect to αk. Mutual-conjugate search directions are
“noninterfering” in the sense that minimization progress made by one step is not
undone by future steps, and x* is consequently found in exactly n iterations if
A is symmetric and positive definite, and exact arithmetic is used. (For
matrixes that are not symmetric and positive definite, the normal equations
are solved instead of Ax = b.) Consequently, the solution can be written
as

(12)

While it is a highly efficient linear solver in its own right, the conjugate gradient
method is also well suited to linear regularization because (Hansen, 1998): (1)
starting from x0 = 0, it can be shown that ||xk|| increases monotonically with
iteration number, k, while ||Axk - b|| decreases monotonically, and; (2) the search
directions associated with large eigenvalues of A (or equivalently large singular values
for ATA) are generated first, while steps that produce high-amplitude oscillations in
the solution occur later. A regularized solution, xk, can thus be obtained by
prematurely truncating iterations at the kth step, before F(x) and ||Ax - b|| are fully
minimized.

2. NONLINEAR PROBLEMS

While the inverse boundary condition design problem for radiation alone is linear,
most other types of inverse problems involving radiant transfer are nonlinear.
Examples include multimode heat transfer, enclosures containing participating media
with temperature-dependent properties, and geometric optimization problems; all of
which are discussed in more detail later.

As mentioned above, a nonlinear inverse problem can sometimes be written as
Axk+1 = b(xk), in which case linear regularization can be used to solve the
ill-conditioned matrix equation at each iteration. More often, however, this class of
problem is cast as a minimization problem of the general form

(13)

where c(x) is a vector of inequality constraints. The objective function depends on
the type of inverse problem being solved; in parameter estimation problems, F(x) is
usually a least squares function,

(14)

where b contains measured data, bmodel(x) is corresponding modeled data,
and x is the unknown state variable. Under certain conditions, Eq. (14) is
minimized by the maximum likelihood estimator of x, the set of parameters
most likely responsible for the observed data. In design optimization, the
objective function quantifies the “goodness” of a particular design and x
contains design parameters, which specify the design configuration, for example,
the heater settings or enclosure geometry. If the goal is to obtain a desired
temperature and heat flux distribution over an enclosure surface, one of
these parameters (say, temperature) is specified over the surface, and the
other (say, heat flux) is calculated for a given set of design parameters. The
enclosure design that satisfies both imposed boundary conditions is found by
minimizing

(15)

while inequality constraints can be used to enforce a nonnegative heat flux over a
heater surface, or make sure the enclosure geometry remains unobstructed and/or
confined within some maximum dimension in the case of shape optimization.

Thus, the challenge for parameter estimation and design optimization problems is
to find a vector x* that minimizes an objective function, F(x), sometimes subject to
constraints. Methods for solving multivariate minimization problems are either
nonlinear programming or metaheuristic.

2.1 Nonlinear Programming

As described above, NLP methods find x* iteratively; at the kth iteration xk is
updated by taking a “step” using Eq. (11), and iterations continue until
||F(xk)|| ≈ ||F(x*)|| = 0 as shown in Fig. 3. These methods differ on how
the search direction is chosen, but this calculation is always based on the
local objective function curvature at xk. Most model F(xk) as being locally
quadratic,

Figure 3. NLP algorithms update xk by calculating a search direction and a step size at each iteration.

(16)

where 2F(x) is the Hessian matrix containing the second-order derivatives of F(x)
with respect to the elements of x. If F(x) were truly quadratic, 2F would be
constant, and x* could be found by setting F(x*) = 0, resulting in (Nash and
Sofer, 1996; Gill et al., 1986)

(17)

Solving this equation results in Newton’s direction, pk = x* - xk, and this NLP
algorithm is called Newton’s method. Since F(x) is not truly quadratic, multiple
steps are required to reach x*, but the quadratic approximation improves as xk
approaches x*, and the method is second-order convergent.

While Newton’s method normally requires the fewest steps of all NLP algorithms
to reach x*, it is not necessarily the most efficient, particularly if 2F(x) is
expensive to calculate. An alternative is the quasi-Newton method [also called the
variable metric method (VMM)], in which the search direction is found by (Nash and
Sofer, 1996; Gill et al., 1986)

(18)

where Bk is a Hessian approximation formed using the gradients from the current
and previous steps. Usually, B0 ≡ I and p0 is the steepest descent direction,
-F(x0); the Hessian approximation improves with subsequent steps, and
Bk ≈ 2F(xk) at large k. The quasi-Newton’s method requires more steps than
Newton’s method, but is often computationally faster since it avoids calculating the
Hessian.

The conjugate gradient approach described above is a different class of NLP
method that generates each new search direction so that it is mutually conjugate to
the previous search direction with respect to the current Hessian. As noted above, for
quadratic objective functions, x* can be reached in exactly n steps, and otherwise
more steps are required.

Although finding F(x) requires solving the well-posed forward problem for
bmodel(x) or qDSmodel(x), the nonlinear approach does not circumvent the underlying
ill posedness of the inverse problem; instead, the ill posedness manifests in Eq. (17).
This situation is shown graphically in Fig 4a, which shows the topography of F(x)
arising from an ill-conditioned linear problem solved using least squares
minimization. As discussed above, solutions that lie in the long shallow valley
surrounding x* “almost” minimize F(x), violating Hadamard’s second criterion for
well-posed problems; choosing a search direction along this line using Newton’s
method (or any other NLP technique) is problematic because the ambiguous
curvature of F(x) makes 2F(xk) ill conditioned. In a topographical sense, the two
eigenvalues of 2F(x*), λmax and λmin, are the rates at which F(x) increases as x
moves away from x* in the principle curvature directions, which are the
corresponding eigenvectors, μmax and μmin. Due to the long shallow valley,
λmax >> λmin, and eigenvalues of varying orders of magnitude have the same meaning
as singular values that range over orders of magnitude that characterize
ill-conditioned matrices.

Figure 4. Nonlinear inverse problems may have objective function topographies
corresponding to violations of (a) Hadamard’s second and (b) third criteria of
well-posed problems.

The solution is to apply linear regularization methods to solve Eq. (17). One
common approach (Nash and Sofer, 1996; Gill et al., 1986) is to compute the
Cholesky factorization of the Hessian, 2F(x*) = LDLT, and then reset the smallest
diagonal elements in D equal to some minimum value before constructing pk through
back-substitution, in a way similar to TSVD. A second technique specialized for least
squares problems is the Levenburg-Marquardt method (Marquardt, 1963), in which
pk is found by solving

(19)

where (xk) is the Gauss-Newton approximation of the Hessian. This is
equivalent to Tikhonov regularization with L ≡ I. Guidance for choosing λk is
provided in (More and Sorensen, 1983).

2.2 Metaheuristics

Nonlinear inverse problems may also produce multimodal objective functions having
multiple local minima, as in Fig. 4b. Each minimum corresponds to a possible
solution to the inverse problem, violating Hadamard’s third criterion of solution
uniqueness. A major drawback of NLP algorithms is that, because they model F(x)
as quadratic, they can only find one of these minima, usually the one closest to
x0.

Metaheuristic methods avoid this problem by strategically sampling the solution
space of a multimodal optimization problem in search of good solutions. While NLP
algorithms always choose pk to be a downhill direction, metaheuristic methods
include stochastic processes that periodically move the solution uphill, allowing them
to escape shallow local minima. Three of the most common metaheuristic methods
are simulated annealing (SA), genetic algorithms (GAs), and tabu search
(TS).

Simulated annealing (Kirkpatrick and Sorensen, 1983) is named after the
microstructure crystallization physics in metal annealing that inspired its
development. Solutions are generated randomly, and the resulting objective
function is evaluated and compared to the current solution. To diversify
the search, SA periodically accepts uphill solutions using a probabilistic
formula,

(20)

where P is the probability that a worse solution will be accepted, ΔE is the
difference in the objective function of the current and new solution, and θ is a
heuristic set by the designer. At large values of θ, the algorithm escapes from
nearly all local minima in the solutions space, allowing the algorithm to
traverse more of the solutions space. After a specified number of solutions
are explored, θ is reduced, forcing the algorithm to remain in the vicinity
of the local minima for more iterations. At the lowest levels of θ, which
occur late in the search, SA approaches a deterministic steepest descent
algorithm.

Genetic algorithms (GAs) (Goldberg, 1989) are modeled on evolutionary processes
in the natural world; although there are many variations, most algorithms include the
following steps:

Generate an initial population of candidate solutions, either randomly or
using an initial solution heuristic. The elements of x are converted from
decimal notation to binary representation.

Select a subset of this population to undergo “mating” based on the
objective function of each population member.

Perform a low-probability mutation operation by randomly sampling a
few bits of the members and reversing their binary value.

Calculate an objective function for each member of the new population.

The selective mating process combines the best attributes of a large population into a
few good solutions, while the random paring and mutation operations prevent the
algorithm from getting “stuck” in a shallow local minimum.

The tabu search differs from SA and GA in that it keeps track of previously
visited local minima, and repeating these steps are thus forbidden or “tabu.” The
tabu search has several key components, namely, a solution representation, a
neighborhood definition, a candidate list structure, and a memory structure (Glover,
1986, 1989). The neighborhood is the subsection of the solution space to be
explored at each iteration of the algorithm, and typically consists of the kinds
of changes made to the current solution (e.g., a subsection of heaters to
be toggled on and off). After a neighborhood is defined, a candidate list
containing a defined number of prospective moves within the neighborhood is
generated. Each of these moves is then evaluated by calculating the objective
function value, and the best nontabu move is used to update the solution.
The best move in the candidate list may or may not be better than the
incumbent solution, so by always accepting the best move from the candidate
list, worse solutions are inevitably accepted, and local minima are escaped.
Inherent in the use of the candidate list is the definition of a tabu move. After
selecting the best move from the candidate list, an essential attribute of
this move is designated as tabu (e.g., a particular heater is not adjusted in
subsequent designs). Once this tabu attribute is defined, a tabu tenure must be
determined that dictates how many iterations the tabu attribute will remain
active.