$$P(X=x) = \{(1, 0.25)(2,0.25),(3,0.25),(4,0.25)\} ,$$ we know that this distribution has multiple medians between $2$ and $3$ if we define a median as a number where $P(X \geq c) \geq1/2$ and $P(X \leq c) \geq 1/2$. I basically want to prove that the set of all medians is the only set of numbers that minimizes $E[|X-a|]$.

Attempt: Distributions can have multiple medians, and these multiple medians are the only numbers that minimize the absolute value of the distance between the mean. That is, there is a 1-1 correspondence between the set of numbers that minimizes the absolute value of the distance between the mean and the set of medians. I am stuck in that I can't show a rigorous proof for this question.

Another one to keep in mind is the log-normal distribution - look at the box on the right for the formulas for mean and median - depending on the standard devation the mean and median can be as far appart as you please. But you are right in the sense that for most garden variety distributions the mean and median will be the same.
–
Peter SheldrickSep 20 '11 at 20:09

The median is not always the closest number to the mean. Mean is very sensitive to outliers, but the median less so. For example, imagine a set of numbers like $\{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 1000 \}$. The mean is $1045/10 = 104.5$. What do you think the median is?
–
SrivatsanSep 20 '11 at 20:09

I think you want $E[|X-a|]$, not $E[X-a]$. This is not at all the same as the "closest number to the mean". Of course you do want to assume that the mean exists in order for this to make sense.
–
Robert IsraelSep 20 '11 at 20:14

@PeterSheldrick: the mean and median are hardly ever equal, except for distributions that are symmetric. But that has nothing to do with what lord12 is trying to do.
–
Robert IsraelSep 20 '11 at 20:17

1 Answer
1

Starting from $|x|=2x^+-x$ where $x^+=\max\{x,0\}$ is the positive part of $x$, one sees that $M(a)=\mathrm E(|X-a|)$ is also $M(a)=2\mathrm E((X-a)^+)-\mathrm E(X)+a$, that is,
$$
M(a)=2\int\limits_a^{+\infty}u_a(x)v'(x)\mathrm dx-\mathrm E(X)+a,
$$
with $u_a(x)=a-x$ and $v(x)=\mathrm P(X\ge x)$. An integration by parts yields
$$
M(a)=2\int\limits_a^{+\infty}\mathrm P(X\ge x)\mathrm dx-\mathrm E(X)+a,
$$
hence $M$ is differentiable at every point where $X$ has no atom and there, the derivative is
$$
M'(a)=1-2\mathrm P(X\ge a)=1-2\mathrm P(X>a).
$$
One sees that $M$ is decreasing on the left of the median(s), constant on the interval made by the median(s) and increasing on the right of the median(s), which proves the claim.

Edit When $X$ is discrete the proof above can be rewritten as follows. Assume the distribution of $X$ is $(p_x)$, hence
$$
M(a)=\sum_xp_x|x-a|.
$$
Each function $a\mapsto|x-a|$ is differentiable at $x\ne a$ with differential $[x<a]-[x>a]$ hence, at every $a$ not in the support of $X$,
$$
M'(a)=\sum_xp_x\left([x<a]-[x>a]\right)=\mathrm P(X<a)-\mathrm P(X>a)=\mathrm P(X\le a)-\mathrm P(X\ge a).
$$
One sees that the function $M$ is decreasing at every point $a$ such that $\mathrm P(X<a)<\mathrm P(X>a)$ or $\mathrm P(X\le a)<\mathrm P(X\ge a)$ and increasing at every point $a$ such that $\mathrm P(X<a)>\mathrm P(X>a)$ or $\mathrm P(X\le a)>\mathrm P(X\ge a)$. This proves the result.