چکیده انگلیسی

The contribution to the sample mean plot, originally proposed by Sinclair, is revived and further developed as practical tool for global sensitivity analysis. The potentials of this simple and versatile graphical tool are discussed. Beyond the qualitative assessment provided by this approach, a statistical test is proposed for sensitivity analysis. A case study that simulates the transport of radionuclides through the geosphere from an underground disposal vault containing nuclear waste is considered as a benchmark. The new approach is tested against a very efficient sensitivity analysis method based on state dependent parameter meta-modelling.

مقدمه انگلیسی

The explicit acknowledgement of uncertainties when trying to understand, predict and control the behaviour of natural and industrial systems is now gaining acceptance and becoming affordable in practice thanks to the tremendous advances in computing capabilities. In the standard probabilistic framework, the uncertain model inputs X=(X1,X2,…,Xk)X=(X1,X2,…,Xk) and the resulting model outputs Y=(Y1,Y2,…,Yr)Y=(Y1,Y2,…,Yr) are treated as random variables characterised by probability distribution functions [1]. Random or quasi-random sampling strategies are adopted in order to select the model inputs and multiple model evaluations (i.e. Monte Carlo simulation) are used for the propagation of this uncertainty. Subsequently, a detailed analysis of the mapping can be carried out using the input samples and related model realisations.
Sensitivity analysis (SA) is the study of how uncertainty in the output of the model can be apportioned to different sources of uncertainty in the model inputs [2]. Ideally uncertainty and sensitivity analysis should be run in tandem (iterative strategy). Graphical methods are important tools to support, guide and interpret the results provided by sensitivity and uncertainty analysis. While bars, tornado graphs or radar charts can be particularly useful to communicate importance measures, box-and-whisker plots are more suitable for the representation of uncertainty analysis results. Valuable information can also be presented in condensed form by the so-called cobweb plots [3], which are able to represent graphically multi-dimensional distributions with a two-dimensional plot. Flexible conditioning capabilities facilitate an extensive insight into particular regions of the mapping and a careful analysis of cobweb plots facilitates the characterisation of dependence and conditional dependence between inputs and outputs. However, for the visualisation of the input–output mapping, the simplest and most widely used plots are the so-called scatterplots. For a given model input XiXi and a single-valued model output YY, a scatterplot corresponds to a projection in the (Xi,Y)(Xi,Y) plane of the sample points defining the (X,Y)(X,Y) hyper-surface. Among the possible extensions, model inputs can be plotted against each other with an intensity ramp corresponding to the values of the model response (matrix of scatterplots), and different colours corresponding to different subsets can be used on a single graph (overlaid scatterplots).
Using the classical version of the scatterplot, although a visual inspection can be seen as an empirical and somehow subjective appraisal of pattern randomness, scatterplots provide rich information on mapping, which the other global SA techniques tend to condense into a few sensitivity indices. It is possible to visualise the values taken by the model response Y across the range of XiXi. When a pattern can be observed in the scatterplot, the stronger the pattern, the more important the influence of the corresponding input on the model output. Some techniques referred to as grid-based methods can be used to assess the randomness of the distribution of points across the range divided into bins. Various statistical tests have been developed in order to assess common means (CMNs), common distributions or locations (CLs) [4], common medians (CMDs) or statistical independence (SI) (see [5], [6] and [7] for recent reviews and comparisons). However, as emphasised by [7], it is possible that the violation of statistical test assumptions could be leading to misrankings of input importance. In addition, there is no universal rule for the determination of an appropriate division of the range (i.e definition of the grid).
In the Probabilistic System Assessment Group framework, a research group established by the Organisation for Economic Co-operation and Development (OECD) Nuclear Energy Agency (NEA), Sinclair [8] investigated changes in the mean and in the variance of various output quantities resulting from finite changes in the inputs’ uncertainties (e.g., shifts or shrinks of their distributions). An approach was proposed in order to estimate the derivative of the expectation of the analysed model response with respect to the parametrised change of shape. In order to circumvent the difficulties related to discontinuities in the model inputs probability distribution functions, the author suggests to fit a smooth curve to the marginal dependence of the mean of the output on the selected inputs. Although it is not necessary to portray this relation graphically for the adopted approach, the contribution to the sample mean (CSM) plot was recognised as a general tool for SA.
Even before Sinclair, the same type of curves were used by social economists as measures of inequality [9]. In that field, they are known as Lorenz curves, associated to the concept of concentration curve, and frequently used to compare the situation in different countries or to assess the evolution of the concentration of wealth over time in a given country. In this paper, the CSM plot is revived in the context of SA of computer models. Rather than aggregated data from official statistics, random samples characterising the input–output mapping of mathematical models are analysed. In Section 3, the scope and potential of this generalised approach are discussed; the outcomes are illustrated using the application example presented in Section 2. In Section 4, a permutation-based statistical test is proposed in order to determine whether the behaviour characterised by the CSM plot significantly departs from randomness. Results from numerical experiments are reported and discussed in Section 5; finally, conclusions are drawn in Section 6.

نتیجه گیری انگلیسی

The contribution to the sample mean plot has shown an interesting potential for the analysis of the relation between the uncertain model inputs and the resulting model response. The visualisation enables the analysis of the evolution of the contribution to the mean across the range, simultaneously for all model inputs. Therefore, a single plot provides a valuable analysis of the input–output mapping. This graphical tool could provide guidelines to improve the sample design or even compose the building block of a variance reduction strategy. Considerable changes in the CSM curve indicate the presence of features in the input/output mapping that can be better explored by intensifying the sampling in that region.
For the prioritisation of model inputs, global importance measures can be derived from the CSM plot and provide the same ranking of first-order variance-based sensitivity indices. Although the CSM plot does not provide variance-based sensitivity indices, the significance of the ranking is assessed using a permutation test which does not require any additional model runs. In practice, only a small fraction of the total number of possible permutations can be performed. As long as this amount yields a reliable description of the cumulative probability distribution for the maximum distances to the diagonal, the number of permutations does not have a significant influence on the outcomes. Apart from the numerical problems to be solved (see discussion in Section 5.2) for some non-monotonic mappings (leading to diagonal crossings in the CSM plot), the main limitation of the approach lies in the fact that inputs are ranked with respect to the first-order effects but no information is available concerning the remaining part of the variance. On the contrary, in variance based techniques, summing up first-order effects the analyst can also assess the importance of interaction effects. For the characterisation of second-order interactions, an extension of the methodology could be developed using the equivalent of the diagonal for a three-dimensional surface (i.e. a plane).
In summary, the graphical tool can be used for numerous purposes including the assessment of the direction of change when modifying the inputs probability distribution functions. Within a more classical SA framework, since no particular sampling design is required, the CSM plot and the proposed statistical test can be used in combination with other SA methods for inputs prioritisation. It can be really reliable and efficient at low sample size if the inputs importance follow a Pareto law (few dominant inputs) but should not be used for fixing non-influential model inputs. Since the construction procedure is straightforward, exploiting the information that could be derived from the contribution the sample variance plot, might also lead to interesting outcomes.