Poisson Statistics

Experimentalists: Nikolai Joseph and Bradley Knockel

SJK 00:51, 19 November 2007 (CST)

00:51, 19 November 2007 (CST)Great job on this lab! Sorry I didn't remember that you were finished. Because it's been so long, I'm not going to make a bunch of comments. A lot of the comments I made on Bradley's page would also be relevant for you: Bradley's page.

I liked your links to the hyperphysics stuff, that will be really useful. Reading through your report, I wasn't 100% sure that you were rock solid on where Poisson distribution comes from, that it converges to Gaussian, etc. With more careful writing, this would probably come across more easily. Same with discussion about your graphs, etc...it wasn't entirely clear what your conclusions were (Poisson is indeed necessary for low-counts). Very nice graphs, and I was impressed with your curiosity to take the long-time data.

Objective

By taking data that generates seemingly random data sets we hope to show that under certain circumstances the data fits a Poisson distribution. Also, a fair analysis of how the data is distributed in non-Poisson situations is in order. We are taking data that we believe to be of cosmic origin, over various apertures of time; for instance, 256 bins of 2 seconds each. The large amount of incidents being recorded over varying sized bins will give us a large variety of distributions.

Theory

When collecting large amounts of data it is wise to look at the probability distributions for that data. From the binomial distribution we can derive the Gaussian and Poisson distributions.

The Binomial Distribution

When analyzing any randomly distributed situation a binomial distribution:

with a standard deviation of

and a mean of

a = pN

is used. With N = the number of counts, p = the probability of counts occurring, and q = the probability of counts not occurring. In all instances p + q = 1, since something either happens or it doesn't, p and q must sum to 1. In context of our experiment, we have a very large N with a very small p. Undergoing several manipulations we can approximate the binomial distribution to be the Poisson distribution. More information can be found here

The Gaussian Distribution

When analyzing a situation in which there is a high probability of occurrence (large p) we use the Gaussian (or normal) distribution. The Gaussian distribution is given by

,

with a = the mean, σ = the standard deviation.
The Gaussian distribution is often used to model probabilities and is useful because if the standard deviation and mean are optimal then the actual mean and standard deviation values will match those given theoretically. A very good tool for understanding the Gaussian distribution can be found here

The Poisson Distribution

When analyzing a random situation in which there is a very low probability of occurrence (large N and small p)we use the Poisson distribution. The standard form is given by:

with a standard deviation of

,

with a = the mean. The Poisson distribution appears only around zero and, unlike the Gaussian or binomial distributions, can only reflect positive integers. One can imagine a Gaussian distribution that has been normalized so it can only be positive values with a mean greater than zero (but not too much greater). A good tool for getting comfortable with Poisson distributions can be found here

Experiment

Setup and Equipment

We have a setup that consists of a photomultiplier tube that is attached to a NaI scintillator, both are housed in a structure of lead bricks. The arrangement is wired to a high voltage power supply (1000 volts) and then run through some sort of bridge and to a data acquisition board on a computer. The computer is running a program called PCAIII, which handles the data acquisition process. The photomultiplier tube and scintillator were connected by way of coaxial cables to the power supply which we connected to the bridge, and from the bridge into the computer. There were some erroneous cables coming from the data acquisition board that we had no need to mess with.

Procedure

Once we had all cables secured and the power supply and bridge were warmed up, we were able to start taking data. Using PCAIII we simply configured how many bins of data we were taking and how much time each bin would get ("dwell time"). We varied our bins from 256 (for 1s, 2s, and 10s) and 4096 (for 10ms, 100ms, and 100s). We experimented with the bins and determined the number of bins to be a 'resolution' of sorts. The more bins, the smoother the data became. We experimented and determined that we'd use a high number in some situations and a low number in others. It was mostly preference.

Data and Discussion

I couldn't figure out a good way to show uncertainty in the mean so I borrow from Bradley the idea of doing it thusly:

where

We concluded that the uncertainty was not as much relevent to the data as it is to the process, and therefore should be considered. Since this data was gathered over several days, and from a completely uncontrollable source, there will be some fluctuation that is not necessarily error.

The data as follows.

10 ms

100 ms

1 second

2 seconds

10 seconds

100 seconds

Notice the odd spike.

Error

This lab is really based on the notion of random error. The whole objective is to record events and to notice that they are distributed randomly. The source(s) of our data we presume to be of cosmic origin, but not of any specific origin that we can identify. We can't say, with any measure of confidence, what is exactly producing our data! Look at the graph for the 100 seconds data and notice the large, sharp spike. That is entirely inexplicable. Using our lead shielding and minimizing disturbances is really the best that can be done to control this experiment.
With Bradley's help I was able to plot the Gaussian and Poisson distributions against our data and you can see that even though we have a good system for making predictions, the data does not always fit. On the longer time scales the data fits fine, but that isn't so much of what we are concerned about. Beside each distribution I give the error from the data in terms of either Gaussian or Poisson.

10 ms

ErrorPoisson = 0.0172

ErrorGaussian = 0.0592

100 ms

ErrorPoisson = 0.0321

ErrorGaussian = 0.0589

1 second

ErrorPoisson = 0.0123

ErrorGaussian = 0.0082

2 second

ErrorPoisson = 0.0097

ErrorGaussian = 0.007

10 second

ErrorPoisson = 0.0032

ErrorGaussian = 0.0028

100 second

ErrorPoisson = notrealnumber!

ErrorGaussian = 0.002

Notice how quickly the Gaussian error drops off after 100 ms and also notice how the Poisson error is highest on the most Poisson looking distribution!

Conclusions

In situations where a Poisson distribution fits the data well, look for a low standard deviation. As the number of successful trials climbs, the Poisson won't be as useful and the Gaussian takes over. Unless it is your goal and you have a question to answer with it, avoid taking data from space.