api for statistics

TODO: need to design api for allocating noise using an optimisation method that
aaron created. for that we need an action bound and estimated value. the
estimated value is not a security parameter; the action bound is. we'll need to
do measurements with an actual client implementation to discover an appropriate
action bound for our desired anonymity set size/security bounds.

TODO: if we ran privcount on all our current statistics, how many of them would
we not be able to collect anymore because it's not possible to add sufficient
noise.

TODO: should we protect the average case statistically, or some factor of the
average?

epsilon is a factor in the probability that you'll be able to distinguish
whether a given user was active on the network on a given day.

let ϵ be a positive real number and A be a randomized algorithm that takes a
dataset as input (representing the actions of the trusted party holding the
data). let imA denote the image of A. the algorithm A is ϵ-differentially
private if for all datasets D1 and D2 that differ on a single element (i.e., the
data of one person), and all subsets S of imA,

Pr[A(D1) ∈ S] ≤ e{ϵ} × Pr[A(D2) ∈ S],

where the probability is taken over the randomness used by the algorithm.

apple used epsilon=43 at one point in time. now they use epsilon=11. (lower
numbers of epsilon are better.) we're aiming for epsilon=0.3.

TODO: need detailed spec on what stats and their noise levels, also versioning
for stats when we want to change and/or tweak noisiness. if a statistic's
version is too old or we believe its noise to be insufficient to maintain
privacy, we should have a mechanism for telling those clients to simply not
report that data.

TODO: need threat modelling and decisions on potential bad relays that decide to
stop adding noise to their collected statistics. the proposed attack is that a
relay could not add noise in order to discover more from the collected data from
other relays. we could not care because any relay which wanted to be malicious
could more effectively do so by exposing their own users, or we could add
additional noise based on consensus weight. another idea is that we can
allocate noise based on the number of N relays in the network such that each
relay gets 1/Nth of the noise.

TODO: optimise this for "simplest possible decisions at first" so that we can
deploy it.