For a given mean $\mu$, what is the entropy maximizing probability distribution on the nonnegative integers?
Different sources indicated either the geometric or the Poisson distribution for this. As ...

For a given discrete probability distribution, Shannon entropy can be though as an expectation value $\langle - \log p \rangle$ (see also: What is entropy, really?, What is the role of the logarithm ...

Suppose I am attempting to calculate the entropy of a continuous, normally distributed random variable $X$, from the distribution $\mathcal{N}(\mu, \sigma)$. This is easy to to do - I just calculate
...

The following optimization problem is related to relative entropy and to the limit of the iterative proportional fitting procedure.
For $1 \leq i,j \leq n$ and fixed $w_{ij} \geq 0$, and fixed $a_i, ...

Let $I(X,Y):=H(X)+H(Y)-H(X,Y)$ be the mutual information of the joint probability distribution $p_{XY}$ (here $H(\cdot)$ is the Shannon entropy of its argument). I know that the mutual information is ...

I am wondering why stochastic encoder and decoder can not help the Shannon source coding? I know the achievability scheme of source coding, which is based on typicality, is deterministic, and hence we ...

Let $G$ be a locally compact group. A measure $\mu$ is the right-Haar measure on $G$ if for every $g\in G$ and $E\subseteq G$ Borel set $\mu(Eg)=\mu(E)$. It is known that every locally compact group ...

Given a probability distribution $(X,p)$, its entropy is defined as $H=-\sum_{x\in X} p(x)\log p(x)$.
Given a sample of observations $x_n,n=1..N$, one can estimate $p(x)=\frac{\#\{i:x_i=x\}}{N}$ and ...

I am not so expert in theoretical computer science, so sorry if the question is trivial, i just could not find it in literature.
Suppose we have a source $X$ with min-entropy $\ell$, the randomness ...

In the introduction to their book "Discriminants, resultants and multidimensional determinant", the authors state a very intriguing observation concerning the coefficients of monomials appearing in ...

Let $H$ be a function of finite sequences of probabilities (non-negative numbers summing up to 1) into real numbers, such that:
$H$ is continuous,
$H$ is symmetric w.r.t. the order of its arguments,
...

Suppose we are given two random variables $X$ and $Y$ with fixed marginal and joint distribution. What is the maximum randomness that we can extract from $Y$ that is independent from $X$, that is, if ...

The channel under consideration is $T = A + B$, where $A$ and $B$ take on values in $\{0, 1\}$ according to a probability mass function. Let (joint) random vector $(A_1, A_2,\ldots, A_n)$ be denoted ...

Suppose $(X,d)$ is a metric space of unit diameer and let $F$ be the collection of all $1$-Lipschitz functions mapping $X$ to $[-1,1]$, equipped with the sup-norm $||\cdot||_\infty$.
I am interested ...

I am reading the paper "chain independence and common information" (http://ttic.uchicago.edu/~yury/papers/independ.pdf). In this paper, an inequality is used several times (without proof) which looks ...

I'm sorry for having open two questions which have been solved by elementary counter-examples provided by @AnthonyQuas. Actually I'm not an expert in information theory and I expected that a positive ...

I came across the following unusual generalization of the Schmidt decomposition in my work, which I describe below. I would like to know if this structure has been studied before so I can read more ...