Let there be $n$ points on a unit circle. It is known they come from "normal" distribution around particular unknown direction (i.e. sum of 2 "normal" distributions on circle - one centered at point $p$ and the other at its opposite $-p$).
What is the best way to estimate this direction? By best I mean an algorithm that is a. analytical, b. efficient and c. simple.

That's a good answer Niels, and I believe it will work quite well in this case. One trouble I see is that by squaring you have multiplied the 'noise' (if you like) by 2. So, I think that this estimator will be consistent, but not efficient.
–
Robby McKilliamOct 15 '10 at 1:13

Actually, I lie. In this case, I think this is exactly what you want to do! I recommend this answer be marked as correct. My answers can just be considered as advertising for the important and interesting the field of circular statistics.
–
Robby McKilliamOct 15 '10 at 2:00

In principle the stretching of the squaring is undone by the shrinking of the square root. I do see a potential problem with the arithmetic mean getting smaller in absolute value, especially if the distribution is rather flat, but I don't think that's a linear effect, is it?
–
Niels DiepeveenOct 15 '10 at 2:31

I think, due to the symmetry, it essential does act in a linear way. I'm pretty sure I can describe very accurately how well your estimator will work. I'll post this as an(other) answer a bit later.
–
Robby McKilliamOct 15 '10 at 3:24

Niels, thanks a lot! I didn't know the RMS concept. If i did it might ring a bell. It's indeed THE best answer in the sense i asked for.
–
Andrei KolinOct 16 '10 at 10:42

The standard way to solve this is to just consider each of your data points as unit vectors, then take the average of those unit vectors. The direction of this averaged vector is the estimated direction.

There is a large literature on this topic which generally goes by the name of directional statistics. The seminal text on is Mardia and Jupp's book Directional Statistics. This field has a huge number of applications in astronomy, biology, meteorology, engineering etc.

The summation works in the case the direction is "signed" i.e. not symmetrical around 0. In my case this wont work: e.g. if there're 100 points at $p$ and 100 points at $-p$ their sum will be 0... Thanks for useful link, though.
–
Andrei KolinOct 14 '10 at 15:38

2

What would happen if you make a new circle by identifying opposite points on the original circle, and then take the average as suggested in this comment? (I haven't checked, so this may have an obvious reason for not working.)
–
gowersOct 14 '10 at 17:06

Ok, so now I will describe why Niels's estimator works so well. Take a bimodal and symmetric circular density function $f$ with modes $p$ and $-p$ (we will assume that $p$ is positive) such as the one plotted in my previous answer. Let $\Theta_1, \Theta_2, \dots, \Theta_N$ be $N$ observations drawn from $f$.

Niels's estimator first computes the complex numbers $e^{i 2 \Theta_n}$ and takes their average
$$ \bar{C} = \sum_{n=1}^{N} e^{i 2 \Theta_n} .$$
The estimate, denoted $\hat{p}$, is given by taking the complex argument of $\bar{C}$ and dividing by 2, that is
$$ \hat{p} = \frac{\angle{\bar{C}}}{2}$$
where $\angle{\bar{C}} \in [0,2\pi)$ denotes the complex argument. The next theorem describes the asymptotic properties of this estimator. I use the notation $\langle x \rangle_{\pi}$ do denote $x$ taken to its representative inside $[-\pi, \pi)$. So, for example, $\langle 2\pi \rangle_{\pi} = 0$ and $\langle \pi + 0.1 \rangle_{\pi} = -\pi + 0.1$.

The definition of the difference $\lambda$ might seem a little strange at first, but it is actually very natural. To see why note that $p$ and the estimate $\hat{p}$ are both in $[0,\pi)$ but, for example, if $p = 0$ and $\hat{p} = \pi - 0.01$ then the difference between these is not $\pi - 0.01$, because the two modes are actually very close to aligned in this case. The correct difference is $\lambda = \tfrac{1}{2}\langle 2(\pi-0.01) - 2 \times 0 \rangle_{\pi} = 0.01$.

The proof of this theorem follows from a very similar argument to Theorem 6.1 (page 87) from my thesis. The original argument is due to Barry Quinn. Rather than restate the proof I'll just give you some convincing numerical evidence.

I've run some simulations for the case when the noise is a sum of two weighted von Mises circular distributions with concentration parameter $\kappa$. So, when $\kappa$ is large the distribution is concetrated and looks something like the picture on the left below ($\kappa = 20$ in this case) and when $\kappa$ is small the distribution is quite spread out and looks something like the picture on the right below ($\kappa = 0.5$). We obviously expect the estimator to perform better when the distribution is quite concentrated ($\kappa$ is large).

Here are the results. The plot below show the simulated variance of $\lambda$ after 5000 trials (the dots) versus the variance predicted in the theorem above for a range of values of $\kappa$ and number of observations $N$. You can see that the theorem does a very good job of accurately predicting the perfomance if $\kappa$ isn't too small.

There is still an open question as to whether this is the best estimator (in the sense of maximally reducing the variance of $\lambda$). It would be possible to derive a Cramer-Rao bound for this estimation problem to give an idea of the best possible performance of an unbiased estimator. I suspect that this estimator performs very near the Cramer-Rao bound. So, in that sense it is close to best possible.

Oh yes! I much prefer Herman Wouk's ryhming version ''When in danger or in doubt, run in circles, scream and shout'' anyway. Thanks!
–
Robby McKilliamOct 16 '10 at 2:51

Eternal Vigilance is the Price of Liberty.
–
Will JagyOct 16 '10 at 3:28

Robby, thanks for your exceptional and insightful comments! I'm lucky to stumble on a real expert in this field! However let me point to you that this last comment doesn't deliver what's promised in one of the comments above (ie. explain WHY Niels' estimator is good). Rather it provides rigorous definition of it + simulation results. For the real explanations (proof of the Theorem) the reader is referred to you thesis :) Of course it is incorrect for me to expect the full answer in a page or 2. If i'll need to address more similar questions i'll surely take a read of all the links.
–
Andrei KolinOct 16 '10 at 10:50

I see now that Andrei would like to know what to do when the distribution has 2 modes and is symmetric about these modes. It seems better to just give a second (more detailed) answer rather than complicate the simple answer I gave above (basically I think the idea in gowers comment above is sound, but it's a bit tricky to actually implement).

So, how do we deal with estimating the 'mean direction' of a distribution that looks something like:

Good questions at this point are ''what is mean direction anyway?'' and specifically for the distribution above ''does a mean direction even exist?''

This has been a question I have been looking at a few months now. I'm wary of blowing my own horn a bit here, but I am going to attach a part of my thesis which I think gives satisfactory answers to these questions (I would love to give you the whole thesis, but it's not quite ready for the public to see). I suggest that there are (atleast) two different, but equally reasonable and intutive definitions of mean direction. I argue that the distribution above has no mean in a rigorously definable sense for both of these definitions.

Given $N$ data points $\Theta_1,\dots, \Theta_N$ on a circle there exist very accurate and efficient O(N)-time algorithms to estimate both of these means if they exist. Neither algorithm will converge if used on circular data drawn from the bimodal distribution above as (according to my definition) the means do not exist.

Still, given $N$ data points $\Theta_1,\dots, \Theta_N$ drawn from the bimodal distribution above, if what you want to do estimate one of the ''modes'' rather than the mean direction, then my gut tells me that there probably are efficient and accurate algorithms to do this, although I don't know if they exist in the literature. You could try Fishers book The Statistical Analysis of Circular Data.

Robby, mostly I want to know how to post a drawing of a peanut.
–
Will JagyOct 14 '10 at 23:17

2

Hi Will, I use metapost. The actually data for the pdf (it's a weighted sum of von Mises distributions by the way) comes from a java library (unfortunately) and then the plotting gets done by metapost. At some point I plan on releasing all of the code I have for simulations and plotting under the CRAPL matt.might.net/articles/crapl. However, at the moment I feel the code is even too crap for the CRAPL :(
–
Robby McKilliamOct 14 '10 at 23:26

@Andrei: If you have further questions, feel free to send me an email.
–
Robby McKilliamOct 14 '10 at 23:32