The jitter {base} R function is useful for smoothed bootstrapping, but manual tuning of the jitter amount is often necessary. Should we change the jitter factor, select the alternative jitter amount estimator or directly specify the jitter amount? I thought of using order statistics for robust estimation of the jitter amount using the range-based estimator. This robust automatic jitter is demonstrated for two smoothed bootstrap examples.

Robust automatic jitter

From the R documentation, if jitter amount = NULL (which is the default) then amount = factor * d/5 where d is the smallest difference between adjacent unique x values. This jitter amount is typically smaller than the alternative estimator examined in this post.

If jitter amount = 0 then amount = factor * (max(x) – min(x)) / 50. When the range (max(x) – min(x)) is large, the jitter amount is large. Knowing that the sample range is strongly affected by sample size, outliers and not a robust estimator of scale, I thought to use the interquartile range IQR(x) instead.

To investigate the relationship between the range and IQR(x), I assumed a normal distribution for x and simulated some data below. The sample range increases with the sample size and for smaller sample sizes the range is about three times IQR(x). Accordingly, the proposed robust jitter amount = 1*3*IQR(x)/50 ≈ IQR(x)/17 (with factor = 1).

n

IQR

Range

Range/IQR

10

1.2

2.9

2.8

20

1.2

3.7

3.1

30

1.2

4.1

3.4

40

1.3

4.3

3.4

50

1.3

4.6

3.5

100

1.3

4.9

3.8

Interquartile range (IQR) and range for the normal distribution. For smaller sample sizes, the range is about three times the IQR.

Two examples

In the two examples below, the objective was to estimate the shift in location for some paired data that were strongly skewed and included outliers. I used the median paired difference as a robust estimator for the shift in location and bootstrapping to estimate the confidence interval for the median paired difference.

For small samples, the bootstrap distribution for the median often appears with “holes” in it because the median of a bootstrap resample is usually one of the few observations in the middle of the original sample. Adding a small amount of jitter to each resample helps to fill the “holes”. Adding too much jitter results in wide bootstrap distributions. The examples below show that the proposed automatic jitter is robust against outliers and offers a reasonable compromise between smoothing and detail.