چکیده انگلیسی

The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (i) locally weighted regression (LOESS), (ii) additive models, (iii) projection pursuit regression, and (iv) recursive partitioning regression. Then, in the second and concluding part of this presentation, the indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.

مقدمه انگلیسی

The importance of uncertainty analysis and sensitivity analysis as components of analyses for complex systems is almost universally recognized, where uncertainty analysis designates the determination of the uncertainty in analysis results that derives from the uncertainty in analysis inputs and sensitivity analysis designates the determination of the contributions of individual uncertain analysis inputs to the uncertainty in analysis results [1], [2], [3], [4], [5], [6], [7], [8], [9], [10] and [11]. A number of approaches to uncertainty and sensitivity analysis have been developed, including differential analysis [12], [13], [14], [15], [16] and [17], response surface methodology [18], [19], [20], [21], [22], [23], [24], [25] and [26], Monte Carlo analysis [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37] and [38], and variance decomposition procedures [39], [40], [41], [42] and [43]. Overviews of these approaches are available in several reviews [44], [45], [46], [47], [48], [49], [50], [51] and [52].
The focus of this presentation is on Monte Carlo (i.e., sampling-based) approaches to uncertainty and sensitivity analysis. Such analyses involve the consideration of models of the form
equation(1.1)
y=f(x),y=f(x),
Turn MathJax on
where
equation(1.2)
y=[y1,y2,…,ynY]y=[y1,y2,…,ynY]
Turn MathJax on
is a vector of analysis results and
equation(1.3)
x=[x1,x2,…,xnX]x=[x1,x2,…,xnX]
Turn MathJax on
is a vector of imprecisely known analysis inputs. In general, the model f can be quite large and involved (e.g., a system of nonlinear partial differential equations requiring numerical solution (see Ref. [53]) or possibly a sequence of complex, linked models as is the case in a probabilistic risk assessment for a nuclear power plant (see Refs. [54] and [55]) or a performance assessment for a radioactive waste disposal facility (see Refs. [56] and [57])); the vector y of analysis results can be of high dimension and complex structure (e.g., the elements of y might be several hundred temporally or spatially dependent functions); and the vector x of analysis inputs can also be of high dimension and complex structure (e.g., several hundred variables, with some variables corresponding to physical properties of the system under study and other variables corresponding to parameters in probability distributions or perhaps to designators for alternative models).
The uncertainty in the elements of x is characterized by a sequence of probability distributions
equation(1.4)
D1,D2,…,DnX,D1,D2,…,DnX,
Turn MathJax on
where Dj is a probability distribution characterizing the uncertainty in xj. Correlations and other restrictions involving the relations between the xj are also possible. Such distributions and any associated restrictions are intended to numerically capture the existing knowledge about the elements of x and are often developed through an expert review process [58], [59], [60], [61], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72] and [73].
The uncertainty characterized by the distributions D1, D2, …, DnX in Eq. (1.4) is often referred to as epistemic uncertainty. Alternate designations for epistemic uncertainty include state of knowledge, subjective, reducible, and type B [74], [75], [76], [77], [78], [79], [80], [81] and [82]. In particular, epistemic uncertainty derives from a lack of knowledge about the appropriate value to use for a quantity that is assumed to have a fixed value in the context of a particular analysis. In the conceptual and computational organization of an analysis, epistemic uncertainty is generally considered to be distinct from aleatory uncertainty, which arises from an inherent randomness in the behavior of the system under study [74], [75], [76], [77], [78], [79], [80], [81], [82] and [83].
Sampling-based uncertainty and sensitivity analyses are based on a sample
equation(1.5)
View the MathML sourcexi=[xi1,xi2,…,xi,nX],i=1,2,…,nS,
Turn MathJax on
from the possible values for x generated in consistency with the distributions in Eq. (1.4) and any associated restrictions. Random sampling is one possibility for the generation of this sample. However, owing to its efficient stratification properties, Latin hypercube sampling is widely used in analyses of this type, especially when computationally intensive models are involved [27], [37] and [38].
The analysis evaluations
equation(1.6)
View the MathML sourceyi=y(xi)=f(xi),i=1,2,…,nS,
Turn MathJax on
provide a mapping between analysis inputs (i.e., xi) and analysis results (i.e., yi) that forms the basis for both uncertainty analysis and sensitivity analysis. Once the preceding mapping is available, the determination of uncertainty analysis results is generally straightforward and involves the generation of summary results such as histograms, density functions, cumulative distribution functions (CDFs), complementary cumulative distribution functions (CCDFs), and box plots for individual elements of y [29, Section 6.5]). The determination of sensitivity analysis results involves the exploration of the preceding mapping with techniques such as examination of scatterplots, regression analysis, correlation and partial correlation analysis, and searches for nonrandom patterns [29, Section 6.6].
The determination of sensitivity analysis results is generally more demanding than the determination of uncertainty analysis results. In particular, the popular regression and correlation-based techniques can fail to appropriately identify the effects of the individual elements of x on the elements of y when nonlinear and nonmonotonic relations are present [29, Section 6.6]. Possible approaches to sensitivity analysis to use in such situations include grid-based statistical analyses of scatterplots [30] and [84], distance-based statistical analyses of scatterplots [85], [86], [87], [88], [89], [90], [91], [92], [93], [94], [95], [96], [97] and [98], multidimensional Kolmogorov–Smirnov tests [99], [100], [101] and [102], rank-concordance tests [103] and [104], and classification trees [105] and [106]. However, the preceding approaches lack the intuitive appeal of regression-based approaches to sensitivity analysis. In particular, regression-based sensitivity analysis can be carried out in a sequential manner with variable importance being indicated by the order in which variables enter the regression model and by the fraction of total variance that can be accounted for as successive variables enter the regression model.
The purpose of this presentation is to describe regression-based techniques for sensitivity analysis that are based on multiple predictor smoothing methods. Such methods are conceptually consistent with regression-based methods that have been widely used in the past in sensitivity analysis [29, Section 6.6], but have the important advantage that they are capable of incorporating local changes in the relationship between a dependent variable (i.e., an element of y) and multiple independent variables (i.e., elements of x). As a result, these methods can be successfully applied in situations involving nonlinear relationships between analysis inputs and analysis results where more traditional regression-based approaches would fail to appropriately capture these relationships.
This presentation is divided into two parts. In this, the first part, traditional approaches to regression-based sensitivity analysis are briefly described (Section 2), nonparametric approaches to regression analysis based on local data smoothing are introduced (Section 3), algorithms for the stepwise implementation of the procedures described in Section 3 as part of a sensitivity analysis are described (Section 4), and a brief summary discussion is given (Section 5). Then, in the second part of the presentation [107], the described procedures are illustrated in sensitivity analyses involving both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant) [56] and [57].
Although analyses for real systems almost always involve multiple output variables as indicated in conjunction with Eqs. (1.1)–(1.3), the following discussions assume that a single real-valued result of the form
equation(1.7)
y=f(x)y=f(x)
Turn MathJax on
is under consideration. Similarly,
equation(1.8)
View the MathML sourceyi=f(xi),i=1,2,…,nS,
Turn MathJax on
is used to represent the result of evaluating y with the sample in Eq. (1.5). This simplifies the notation and results in no loss in generality as the results under discussion are valid for individual elements of y. All statistical analyses in this presentation are carried out within the R statistical computing environment [108], which is an open source equivalent to the S-Plus statistical package [109].