Suppose we are given a set of points $(x_i, y_i)\in\mathbb{R}^2$ and are told that they are drawn from a normal (Gaussian) distribution. It is a simple matter in that case to find the mean $(\mu_x,\mu_y)$ of the distribution: $(\langle x \rangle, \langle y \rangle)$, where $\langle x \rangle=\frac{1}{N}\sum_i x_i$ and similarly for $\langle y \rangle$, is an unbiased estimator.

Now suppose, instead, that each point is randomly presented as either $(x_i,y_i)$ or $(y_i, x_i)$, with equal probability. Is there still a comparably simple way to estimate the mean of the original distribution? (Assume $\mu_x \le \mu_y$.) This is equivalent to the case where the points are drawn from a mixture of two Gaussians, where one is constrained to be the reflection of the other across the line $y=x$. However, one might hope that the symmetry of the problem leads to some simplification from the general case of two Gaussians.

Note, by the way, that taking $\mu_x=\langle \min(x, y)\rangle$ and $\mu_y = \langle \max (x,y) \rangle$ does not work: this is adequate only when the individual Gaussian distributions are well-separated by the line $y=x$. Otherwise, it introduces a systematic bias away from that line.

I know a way but it is not as simple as the sample mean. Actually there are two ways.
–
Seyhmus GüngörenAug 17 '12 at 22:44

Does EM algorithm apply to this situation?
–
TunococAug 17 '12 at 23:04

@Tunococ: I'd be happy with an expectation-maximizing solution, provided it's explicit. But any kind of iterative search for an EM solution isn't really any simpler than the general case of two arbitrary Gaussians.
–
mjqxxxxAug 17 '12 at 23:14

Of course, $m_k$ is estimated by the standard estimators (though high order moments might require many samples to get good estimations) and then, computing $\alpha$ and $\delta$ we have the estimated $\mu_x$,$\mu_y$.

The problem can be solved by EM. Let the first Gaussian have mean $\mu_1 = [\mu_x;\mu_y]$ and covariance matrix $\Sigma_1 = \left( \begin{array}{cc}
\sigma_{xx}^2 & \sigma_{xy} \\
\sigma_{yx} & \sigma_{yy}^2 \\
\end{array} \right)$
where $\sigma_{xx}^2$ is the variance of $x$, $\sigma_{yy}^2$ is the variance of $y$ and $\sigma_{xy}$ is the covariance between $x$ and $y$. It is easy to see that the second Gaussian will have mean $\mu_2 = [\mu_y;\mu_x]$ and covariance $\Sigma_2 = \left( \begin{array}{cc}
\sigma_{yy}^2 & \sigma_{xy} \\
\sigma_{yx} & \sigma_{xx}^2 \\
\end{array} \right)$.

If you define :
$R = \left(\begin{array}{cc}
0 & 1 \\
1 & 0 \\
\end{array} \right)$
, we can write $\mu_2 = R \mu_1$ and $\Sigma_2 = R\Sigma_1R$.
The mixture weights are fixed at $0.5$ each since you mentioned that the dimensions are flipped with equal probability. So you only have to estimate $\mu_1$ and $\Sigma_1$, which is a simplification from the generic case of a mixture model with two Gaussians. An EM algorithm can now be formulated for this problem using the standard recipe.

EM does apply to this situation. It is of course much complexer than the sample mean estimator. Other iterative algoritms as well. The second way is to make the estimation in a well seperated regions. I also think about using the sample mean estimator and another, a bit complexer estimator, for the bias term. Then The overall estimator will be simply the difference between these two estimators. I think if the variance for $x$ and $y$ is known, then one can seperate the estimation to be made into two distinct regions. Another way came to my mind is to use EM partially just to get some idea about the regions. Then to use the sample mean estimators only on these regions.