The Shannon Entropy for an observation is given by $ -x \log_2(x)$. Why is the maximum entropy achieved at $x = \frac{1}{e}$, and not at $x = 0$? Could someone provide a logical explanation that justifies the mathematics?

$\begingroup$There is no entropy for "an observation". And I wonder how you got that it's maximized at $x=1/e$ (?)$\endgroup$
– leonbloyApr 3 '14 at 15:52

$\begingroup$It is true that the maximum of $-xlog_2(x)$ is at $x=1/e$, as you can see by differentiating/graphing. But I don't see any relevance of that to entropy.$\endgroup$
– Martin LeslieApr 3 '14 at 16:19

The term $-\log_2(p_i)$ is the length of the optimal encoding of variable $i$. So a string of $n$ variables will have an encoded length of $nH$, and $-sp_i\log_2(p_i)$ is the number of bits used in encoding variable $i$. The fact that $-x\log(x)$ is maximized at $1/e$ means that more bits will be used encoding variables of probability $1/e$ than any other probability. That's fairly interesting.

As to why the mininum is not at 0: $-x\log x$ is, again, proportional to the total number of bits used toward encoding a variable of probability $x$. While the encoding length of a variable of probability very close to 0 is very large, the number of times it will be used is very small, so it's at least not CLEAR that 0 should be the winner. And indeed, when you do the math it's not.