TDD Kata 10 – Anagrams

Last week: Print Diamond

My first implementation suffered from the anticipated problem: Starting with tests for the whole output for ‘A’, then ‘B’, then ‘C’, the full algorithm was implemented after the second or third test in one big step.

In the following days, I exposed some internal properties per row to be able to test them one after each other, before tieing them together: the letter, the left offset and the gap between left and right letter. This is how the test looked like in the end for PHP:

What did I learn? Divide and conquer! If you have to implement too much at once to make the next test pass, write a different test and start to test parts instead of the whole result. Even if that means to make previously internal properties public. If this is a problem, you can still make them private later and throw away the tests.

Tenth Kata: Anagrams

Given a file containing one word per line, print out all the combinations of words that are anagrams; each line in the output contains all the words from the input that are anagrams of each other. For example, your program might include in its output:

If you run this on the word list here (http://codekata.com/data/wordlist.txt) you should find 20683 sets of anagrams (a total of 48162 words), including all-time favorites such as

crepitus cuprites pictures piecrust
paste pates peats septa spate tapes tepas
punctilio unpolitic
sunders undress
For added programming pleasure, find the longest words that are anagrams, and find the set of anagrams containing the most words (so “parsley players replays sparely” would not win, having only four words in the set).

if working with files in tests turns out difficult for whatever reason, feel free to use other forms of input/output. Bonus points if you still manage to optimize for big word lists after making it just work.