Description

In EAdet an epidemic is started at a center of the data.
The epidemic spreads out and infects neighbouring points (probabilistically or deterministically).
The last points infected are outliers. After running EAdet an imputation with EAimp may be run.

Arguments

if reach="max" the maximal nearest neighbour distance is used as the basis for the
transmission function,
otherwise the weighted (1-(p+1)/n) quantile of the nearest neighbour distances is used.

transmission.function

form of the transmission function of distance d:
"step" is a heaviside function which jumps to 1 at d0,
"linear" is linear between 0 and d0,
"power" is (beta*d+1)^(-p) for p=ncol(data) as default,
"root" is the function 1-(1-d/d0)^(1/maxl)

if TRUE the number of infections is the expected number and
the infected observations are the ones with largest infection probabilities.

rm.missobs

Set rm.missobs=TRUE if completely missing observations should be discarded. This has to be done actively as a safeguard to avoid mismatches when imputing.

verbose

More output with verbose=TRUE.

Details

The form and parameters of the transmission function should be chosen such that the infection times have
at least a range of 10. The default cutting point to decide on outliers is the median infection time plus three times
the mad of infection times. A better cutpoint may be chosen by visual inspection of the cdf of infection times.

EAdet calls the function EA.dist, which passes the counterprobabilities of infection (an n*(n-1)/2 size vector!) and three parameters (sample spatial median index, maximal distance to nearest neighbor and transmission distance=reach) as arguments to EA.det. The distances vector may be too large to be passed as arguments. Then either the memory size must be increased. Former versions of the code used a global variable to store the distances in order to save memory.

Value

EAdet returns a list whose first component output is a sub-list with the following components:

sample.size

Number of observations

discarded.observations

Indices of discarded observations

missing.observations

Indices of completely missing observations

number.of.variables

Number of variables

n.complete.records

Number of records without missing values

n.usable.records

Number of records with less than half of values missing (unusable observations are discarded)

medians

Component wise medians

mads

Component wise mads

prob.quantile

Use this quantile if mads fail, i.e. if one of the mads is 0.

quantile.deviations

Quantile of absolute deviations.

start

Starting observation

transmission.function

Input parameter

power

Input parameter

maxl

Maximum number of steps without infection

min.nn.dist

maximal nearest neighbor distance

transmission.distance

d0

threshold

Input parameter

distance.type

Input parameter

deterministic

Input parameter

number.infected

Number of infected observations

cutpoint

Cutpoint of infection times for outlier definition

number.outliers

Number of outliers

outliers

Indices of outliers

duration

Duration of epidemic

computation.time

Elapsed computation time

initialisation.computation.time

Elapsed compuation time for standardisation and calculation of distance matrix