The purpose of this (XS) module is to calculate the median (or in principle also other statistics) with confidence intervals on a sample. To do that, it uses a technique called bootstrapping. In a nutshell, it resamples the sample a lot of times and for each resample, it calculates the median. From the distribution of medians, it then calculates the confidence limits.

In order to implement the confidence limit calculation, various other functions had to be implemented efficiently (both algorithmically efficient and done in C). These functions may be useful in their own right and are thus exposed to Perl. Most notably, this exposes a median (and general selection) algorithm that works in linear time as opposed to the trivial implementation that requires O(n*log(n)).

This list of functions is loosely sorted from basic to comprehensive because the more complicated functions are usually (under the hood, in C) implemented using the basic building blocks. Unfortunately, that means you may want to read the documentation backwards. :)

Additionally, there is a whole set of general purpose, fast (XS) functions for calculating statistical metrics. They're useful without the bootstrapping related stuff, so they're listed in the "OTHER FUNCTIONS" section below.

Calculates the confidence limits for STATISTIC. Returns the lower confidence limit, the statistic, and the upper confidence limit. STATISTIC_SAMPLES is the output of, for example, resample_means(\@sample). CONFIDENCE indicates the fraction of data within the confidence limits (assuming a normal, symmetric distribution of the statistic => simple confidence limits).

For example, to get the 90% confidence (~2 sigma) interval for the mean of your sample, you can do the following:

Calculates the confidence limits for the CONFIDENCE level for the median of SAMPLE. Returns the lower confidence limit, the median, and the upper confidence limit.

In order to calculate the limits, a lot of resampling has to be done. NSAMPLES defaults to 1000. Try running this a couple of times on your data interactively to see how the limits still vary a little bit at this setting.

Calculates the median (second quartile) of a sample. Works in linear time thanks to using a selection instead of a sort.

Unfortunately, the way this is implemented, the median of an even number of parameters is, here, defined as the n/2-1th largest number and not the average of the n/2-1th and the n/2th number. This shouldn't matter for nontrivial sample sizes!

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.0 or, at your option, any later version of Perl 5 you may have available.