Simple XOR theory

Hey guys I have always wondered this but no one's been able to answer me. I know how to crack a simple XOR (not a one-time pad, but one where the key length is smaller than the plain text).
According to Applied Cryptography Second Edition you are supposed to...

1. Discover the length of the key by a procedure known as counting coincidences. XOR the cipher text against itself shifted carious numbers of bytes and count those bytes that are equal. If the displacement is a multiple of the keylength, then something over 6 percent of the bytes will be equal. If it is not, then less than 0.4 percent will be equal (assuming a random key encrypting normal ASCII text; other plaintext will have different numbers).This is called the index of coincidence. The smallest displacement that indicates a multiple of the key length is the length of the key.

2. Shift the ciphertext by that length and XOR it with itself. This removes the key and leaves you with plaintext XORed with the plaintext shifted the length of the key. Since english has 1.3 bits of real information per byte, there is plenty of redundancy for determining a unique decryption.

How does this actually work? What's the mathematical theory behind it?

Hey guys I have always wondered this but no one's been able to answer me. I know how to crack a simple XOR (not a one-time pad, but one where the key length is smaller than the plain text).
According to Applied Cryptography Second Edition you are supposed to...

How does this actually work? What's the mathematical theory behind it?

Take two random characters from a text. Compare them. What is the probability that they will be the same character? One divided by the number of characters in the alphabet.

If, for example, your text was WWI-era English telegraph cipher interceptions, you'd have an alphabet of 26 characters (upper-case A-Z with no spaces), so with random text you'd expect a match - a coincidence - 1/26, or 3.85% of the time.

Now, suppose your text is standard English in the same alphabet. What would be the odds? Much higher. The precise value, like the value for random text, depends upon the alphabet, and it also depends on the language. Ordinary English, written as a telegraphic cipher (upper-case A-Z with no spaces), yields a coincidence 6.65% of the time.

These values are referred to as the Indices of Coincidence of random text and of plain text.
http://en.wikipedia.org/wiki/Index_of_coincidence

Now the interesting thing about monoalphabetic ciphers is that they don't change the index of coincidence (IC). Substituting 'X' for 'A' throughout a text doesn't change the likelihood that the two characters will be the same. So a monoalphabetic cipher will have the same IC as plaintext.

But polyalphabetic ciphers are very different. Each of the individual alphabets will have the same IC as plaintext, but because the different alphabets are all muddled together, their combined IC will be much closer to that of random text than of plain text. In other words, the IC of the cipher text will be close to 0.0385.

Now, what happens when you subtract the text from itself, shifted by one or more places. If the amount of the shift is unrelated to the length of the keyword, each letter is being subtracted from a letter in a different alphabet, and the result is still pretty much random, so you should see 0 about 3.85% of the time. But when the shift is a multiple of the length of the keyword, each letter is being subtracted from a letter that is in the same alphabet it is. So you'd expect to see 0 about as often as two random characters from plaintext would match. That is, 6.65% of the time.

Think about it. You have two ciphertext letters, each of which is the sum of a key letter and a plaintext letter:

C1 = P1 + K1
C2 = P2 + K2

When you subtract the two ciphertext letters, you get:

C2-C1 = P2-P1 + K2-K1

But when the two letters are separated by a multiple of the keylength, K1 and K2 are the same, and you get:

C2-C1 = P2-P1 + K1-K1
C2-C1 = P2-P1

A value that is independent of the key.

When you're done with the process, you have determined the length of the key. With that, you can split the message into its separate alphabets, and attack each as an ordinary monoalphabetic substitution. Or, you might recognize that the result of your subtract is the plaintext subtracted from itself, shifted by the length of the key. In other words, it's an autokey cipher, which has known methods of attack.