Sample solutions and discussion
Perl Quiz of The Week #4 (20021106)
Two words are said to be 'anagrams' if the letters of one word can be
rearranged to form the other word. For example, in English, 'ascot'
and 'tacos' are anagrams; so are 'tacos' and 'coats'. A set of words
that are all anagrams of one another is an 'anagram set'. For
example,
ascot tacos coast coats
is an anagram set.
Letter case doesn't matter, so, for example, 'liberating' and
'Gilbertian' are considered to be anagrams.
Write a program, make_anagrams, which reads a list of words, one per
line, and finds all the anagrams in the the word list. It should
output an anagram listing, as follows:
* 'Words' that contain digits or punctuational characters should be
ignored.
* Anagram sets that contain only one word should be omitted from the output.
* If an anagram set contains two words, say 'halls' and 'shall', the
output should contain two lines:
halls shall
shall halls
* If an anagram set contains more than two words, the entire set
should be listed under the alphabetically first word; the others
should cross-reference it. For example:
headskin nakedish sinkhead
nakedish (See 'headskin')
sinkhead (See 'headskin')
* Finally, the output lines should be in alphabetic order.
For example, if the input was
5th
ascot
ate
carrot
coast
coats
cots
Dorian
eat
halls
headskin
inroad
nakedish
ordain
Ronald's
shall
sinkhead
tacos
tea
then the output should be:
Dorian inroad ordain
ascot coast coats tacos
ate eat tea
coast (See 'ascot')
coats (See 'ascot')
eat (See 'ate')
halls shall
headskin nakedish sinkhead
inroad (See 'Dorian')
nakedish (See 'headskin')
ordain (See 'Dorian')
shall halls
sinkhead (See 'headskin')
tacos (See 'ascot')
tea (See 'ate')
If you need a sample input, you may obtain English word lists from
http://perl.plover.com/qotw/words/
If you prefer to do this quiz in a language other than English, please
substitute whatever conventions are appropriate for that language.
----------------------------------------------------------------
Here's some sample code:
#!/usr/bin/perl
while (<>) {
chomp;
next if /[^A-Za-z]/;
my $key = join "", sort(split //, lc $_);
$sets{$key} .= "$_:";
}
for my $wl (values %sets) {
my @wl = split /:/, $wl;
if (@wl == 2) {
push @output, "$wl[0] $wl[1]";
push @output, "$wl[1] $wl[0]";
} elsif (@wl > 2) {
my ($first, @rest) = sort insensitive @wl;
push @output, "$first @rest";
for (@rest) {
push @output, "$_ (See '$first')";
}
}
}
for my $line (sort insensitive @output) {
print $line, "\n";
}
sub insensitive {lc($a) cmp lc($b)}
The key to solving this problem was figuring out how to decide if two
words were anagrams. Two words are anagrams if they have the same
letters, possibly in a different order. The easy way to find out if
this is the case is to take the letters and put them in some canonical
order, for example by sorting them; if the sorted letter lists are the
same, then the words were anagrams.
Nearly everyone who solved this problem did so by splitting the
incoming words into letters, sorting the letters, and joining them
back together to form a hash key. For example, the hash key for the
word 'Ethiopian' would be 'aehiinopt'. Then you just list each word
under its own hash key, and you have a hash where the hash values are
lists of anagrammatic words.
The first 'while' loop reads the input, computes the key for each
word, and installs it into the %sets hash under the appropriate key.
At the end of the loop, the %sets values are lists of anagrams,
separated by ':' characters. It would have been preferable to use a
hash of arrays, but _Learning Perl_ doesn't cover that. A typical
value: $sets{'einrs'} is "resin:rinse:risen:siren:".
The second loop prepares the output. It scans over all the anagram
lists ('$wl' stands for 'word list') and decides what the output will
look like for each list. It appends lines of output to @output; later
these lines will be sorted and printed.
If there are exactly two words in $wl, then two output lines are
appended. If there are more than two, then they're sorted into order,
the entire word list is listed under the first word, and the rest have
crossreferences.
Finally, the output lines are printed in alphabetical order. The
utility function 'insensitive', which compares two strings without
regard to case, is used in two places to produce slphabetically sorted
lists.
There was some discussion about what to do when the input contained
two words with the same spelling but different case, as with 'Polish'
(pertaining to Poland) and 'polish' (to make something shiny.) The
version above includes such words, which means that there are some
silly outputs that inform you that 'polish' is an anagram of
'Polish'. If you don't like this, add
next if $seen{lc $_}++;
in the top loop; this discards any word that has been seen before with
any capitalization. (This is another application of the 'canonical
form' idea; this time the canonical form of a string is the
all-lowercase version.)
The 'expert' quiz postmortem is delayed because the results are so
very interesting. There were many solutions posted to the -discuss
list and much discussion, and I don't want to leave out anything
interesting. Expect it tomorrow, along with a pair of new quizzes.
Thanks to everyone who participated, whether or not they sent mail
about it.