Propagation of interval and probabilistic uncertainty in cyberinfrastructure-related data processing and data fusion

Abstract

Data uncertainty affects the results of data processing. So, it is necessary to find out how the data uncertainty propagates into the uncertainty of the results of data processing. This problem is especially important when cyberinfrastructure enables us to process large amounts of heterogeneous data. In the ideal world, we should have an accurate description of data uncertainty, and well-justified efficient algorithms to propagate this uncertainty. In practice, we are often not yet in this ideal situation: the description of uncertainty is often only approximate, and the algorithms for uncertainty propagation are often not well-justified and not very computationally efficient. It is therefore desirable to handle all these deficiencies. In this thesis: • in Chapter 2, we explain in what sense the existing approach to uncertainty—as a combination of random and systematic (interval) components—is only an approximation; a more adequate three-component model (with an additional periodic error component) is described and justified, and the existing uncertainty propagation techniques are extended to this model; • in Chapter 3, we provide a justification for a practically efficient heuristic technique—namely, for a technique based on fuzzy decision-making; and • in Chapter 4, we explain how the computational complexity of processing uncertainty can be reduced. ^ All these methods are based on the idealized assumption that we have a good description of the uncertainty of the original data. In practice, often, we do not have this information, we need to extract it from the data. In Chapter 5—which describes future work—we describe ideas on how this uncertainty information can be extracted from the data.^