In a Caesar cipher, each character in the plaintext is shifted by a fixed amount. For instance, if the shift is 3, "a" becomes "d", "b" becomes "e", and so on. In the Vigenère cipher, multiple shifts are used. For instance, if the key is (2, 4, 9), the first character is shifted by 2, the second character is shifted by 4, the third character is shifted by 9. The pattern then repeats with shifts of 2, 4, 9, 2, 4, 9, and so on until the entire plaintext is encrypted. Note that the length of the key is arbitrary.

The first thing we can do is determine how long the key is. Since the key repeats, it is likely that characters in the ciphertext will also repeat with the same frequency. We can apply a simple autocorrelation function:

Note that comparing each character with the character 6 characters later has 16 matches, which is much better than the other offsets. So it's pretty likely that the key is length 6. (There are also spikes at multiples of 6, as you'd expect.)

Now let's make a function to decode an encrypted character given an offset. We need the algorithm's character set mapping m used to shift the characters. The typical Vigenère uses a character set of 26 letters, but the challenge algorithm uses a much larger character set. To decode a character, we look it up in the mapping to get an integer index, shift by the offset, and convert back to a character.

For example, if our characters were shifted by 3, "D" should decode as "A":

(decodechar #\D 3)
#\A

At this point, there are multiple approaches to break the cipher. If you have any known plaintext, you can immediately get the key. Without a known plaintext, you can take an expected plaintext such as " the " and drag it through the ciphertext, generating a presumtive key at each point. If the same pattern pops out in several places, you may have the key.

Alternatively, if we take every 6th character in the ciphertext, we have 6 Caesar cipers, and we can break each one separately. Breaking a Caesar cipher is easy. One approach is frequency analysis; assuming there are more e's and t's than q's and z's in the plaintext, you can compute the likelihood of each key. Log-likelihood is a typical way, but Arc doesn't have a log function, annoyingly enough.

I decided to take a more brute force approach. I made a function that prints out the decoding of one of the Caesar substrings given an offset value.

Looking this over, it's pretty obvious that offset 19 yields normal English letters, punctuation, and spacing, and everything else doesn't. (Of course, since we're only looking at every sixth letter, the result isn't readable.) We can repeat this for the other positions in the string

The same technique finds the key length is 9, the offsets are (57 20 20 56 13 16 20 56 4), and the plaintext and key are:

(decode s2 '(57 20 20 56 13 16 20 56 4))
This is basically just a bunch of jibberish text, though certainly able to be read, so that chneukirchen may test out this encryption method. If he succeeds well done! I sincerely congratulate him. If he does not I ask that at least he not continue to make fun of this cipher which is so easy a "7 year old" could crack it, or "it can be cracked by hand" according to others. Let me repeat once more I know it CAN be cracked, but I bet MOST people (reddit is not "most people") won't do it.
(each x '(57 20 20 56 13 16 20 56 4) (pr (m x)))
4tt3mpt3dnil

This shows how Arc can be used as a tool in simple cryptanalysis, and also shows that homegrown cryptography can be very weak. If you're interested in cryptography, I highly recommend that you read Applied Cryptography. If it's the history of cryptography that interests you, I recommend The Codebreakers.

Basically statistics helps you out. As long as the sub-texts are long enough to gather statistically significant samples, you can perform letter frequency analysis. If you select every 6th letter from an english text, e should still be the most common letter in the text. With punctuation, space may beat it.

In English text, it's pretty likely that there will be some spaces 3 characters apart, 4 characters apart, 5 characters apart, 6 characters apart, and so on. (For instance, every 5-letter word will be surrounded by spaces 6 characters apart.) Similarly, there are likely to be e's and t's spaced a variety of distances apart.

If the key is 6 characters long, each pair of spaces 6 characters apart will turn into a pair of something 6 characters apart. One pair might turn into a pair of Q's, another might turn into a pair of @'s. Likewise with pairs of any other plaintext character. Thus, you'll end up with as many 6-character-offset pairs as in the plaintext. Pairs at other distances, say, 5, will turn into different characters when the key is applied. The result is the ciphertext will have a bunch of cases with the same character 6 apart, and not very many for other distances. You'll get some matches at other distances just by chance, but this will be fairly small.

As bitshifter mentions, the statistics are important. English text tends to have a fair number of repeats. If the plaintext were random data, this technique wouldn't work.