Information theory questions

Some Qs on information -- something rather new to me. I'm not quite sure what to do with these:

1. The problem statement, all variables and given/known data

a) Lack of information is defined as

[tex]S_{info} = - \sum_{i} p_{i}log_{2}p_{i} [/tex]

([tex]p_{i}[/tex] is the probability of the outcome i). Calculate the entropy associated with the reception of N bits.

b) The human brain has approx. [tex]10^{10}[/tex] neurons each connected with 1000 neighbours. If one assumes that memory consists in activiting or deactivating each bond, what is the maximum amount of information which may be stored in the brain?

3. The attempt at a solution

a) Well, a bit is either on or off (1 or 0), so I suppose that each bit received has a probability of 0.5 attached to it (there are two states, and there is no mathematical reason to think one more probable than the other). So I would say all the [tex]p_{i}[/tex] are 0.5. If there are N bits, then that's N 'outcomes', each with probability 0.5.

b) I guess it's a bit to a bond -- that is, each activation/deactivation corresponds to a bit. But are the 'neighbours' it refers to, do you think, supposed to be in addition to the [tex]10^{10}[/tex] collection? In which case, is it merely [tex]10^{10} \times 1000[/tex] bits of information? Too easy, I suspect... Hmm.

I'm afraid the solution is more complicated than your attempt. Start with part a.

The sum is not over bits, but over every string of 1000 bits that gives the same result.
You chose to start with equal numbers of 1's and 0's. Let's call this case A. You'll find out that case A is the most likely of all cases, and the one that has maximum entropy. [BTW, in the language of statistical mechanics a particular string of 1000 bits is a "microstate", and the case of so many 1's and 0's is a "macrostate." Each macrostate can have a huge number of corresponding microstates.]

To calculate the information entropy of A, call it S_A, you need to figure out the number n of different strings that have 500 1's and 500 0's, out of the total number of possible strings N. This is a problem in combinatorics, properly considering permutations. With these numbers in hand, the probabilty of the ith single such string is p_i=n/N. Lacking information to the contrary, we can assume that all p_i for case A have the same value. The sum over the n (identical) p_i's is now simple, and you have the entropy.

Realize there is a value of entropy for every other case S_B, S_C, etc., where these cases consist of all possible outcomes of 1000 bits (1000 1's; 999 1's;.... one 1; zero 1's). You can reason from the binomial distribution that case A is the most likely, so the entropy of all other cases will be smaller. You could calculate one of them for comparison. (Take an extreme case like all 1's, for instance).

Unfortunately I have done very little stats. My answer to (a) is evidently wrong. I wonder if someone could exhibit at least part of the correct solution (or perhaps do it for a specific case, say, of 4 bits of information), and then perhaps it will be clearer in my mind, and I'll be able to finish it off (or do it for the general case).

That's pretty irresponsible of your professor! I don't have time to address this now (I'm at work, and later I'll be preparing for tomorrow's Thanksgiving party). I can steer you to this articlehttp://en.wikipedia.org/wiki/Information_entropy
Following Eq. (5) is a short discussion of a coin toss and a curve showing maximum entropy for a fair coin. (This curve is for a single toss).

I can get back to you on Friday. Can anyone else help in the meantime?

That's pretty irresponsible of your professor! I don't have time to address this now (I'm at work, and later I'll be preparing for tomorrow's Thanksgiving party). I can steer you to this articlehttp://en.wikipedia.org/wiki/Information_entropy
Following Eq. (5) is a short discussion of a coin toss and a curve showing maximum entropy for a fair coin. (This curve is for a single toss).

I can get back to you on Friday. Can anyone else help in the meantime?

Enjoy the thanksgiving party.

I shall re-launch part of the question in a new thread, along with my rather rushed musings on it so far...

In other words, there are 6 microstates that belong to the macrostate two 0's and two 1's.

Let's turn now to the entropy,
[tex]S = - \sum_{i=1}^{m} p_{i}log_{2}p_{i} .[/tex]
For random bits (or coin tosses or dice throws), the probability of each microstate making up a macrostate is identical and equal to 1/m. The entropy simplifies to
[tex]S = log_{2} m [/tex].
For case A,
[tex]S_{A} = log_{2} 6 [/tex].
This case has the maximal information entropy for N=4 random bits.
Compare to the extreme case B of zero 0's. The number of ways to take N objects from N is m=1, so
[tex]S_{B} = log_{2} 1 = 0 [/tex].
This (and the case of all 0's) have minimal entropy.

In words, case A makes the least assumptions about the bits (equal probability of 0 or 1), which is justified if we have no information to the contrary. This is the case which conveys the maximal amount of information. If that seems confusing, consider case B where every message consists of all 1's. All messages are the same, and we knew what they'd be in advance, so we receive no information at all!

Hope this helps, and gives you a start towards understanding Shannon's information entropy.