Statistical Fingertrees

I'm a bad person. I often use this method to compute the mean and variance of some samples. It's not robust. It performs badly if the variance is small compared to the size of the samples. I shouldn't use it. As penance, this is a quick article on dynamically computing variances robustly and efficiently for datasets that are frequently manipulated.

For convenience I'll talk about the unscaled variance: the sum, not the average of the square deviation from the mean. You can easily compute the variance from this if you know the dataset size.

If we know the size, mean and unscaled variance of two sets (more properly, multisets) we can find the mean and unscaled variance of the their union using the formula here. This method is much more robust that the naive algorithm.

We now have a data structure that allows us to freely split, join, delete elements from and add elements to sequences of samples, all the while robustly keeping track of their mean and unscaled variance (and hence their variance).