Learning Python via a little word frequency program

I'm learning Python by reading David Beazley's "Python Essential Reference"
book and writing a few toy programs. To get a feel for hashes and sorting,
I set myself this little problem today (not homework, BTW):

Given a string containing a space-separated list of names:

names = "freddy fred bill jock kevin andrew kevin kevin jock"

produce a frequency table of names, sorted descending by frequency.
then ascending by name. For the above data, the output should be:

I'm interested to learn how more experienced Python folks would solve
this little problem. Though I've read about the DSU Python sorting idiom,
I'm not sure I've strictly applied it above ... and the -x hack above to
achieve a descending sort feels a bit odd to me, though I couldn't think
of a better way to do it.

reversed(sorted(pairs)) avoids the little -v hack and makes it more
obvious what you are doing. Of course this could also be achieved by
doing pairs.sort() and pairs.reverse() before iterating over the pairs
list.

Now I might choose a very different solution for a more serious
application, depending on detailed specs and intended use of the
"frequency table".
> Though I've read about the DSU Python sorting idiom,
> I'm not sure I've strictly applied it above ...

Perhaps not "strictly" since you don't really "undecorate", but that's
another application of the same principle : provided the appropriate
data structure, sort() (or sorted()) will do the right thing.

> and the -x hack above to
> achieve a descending sort feels a bit odd to me, though I couldn't think
> of a better way to do it.

The "other" way would be to pass a custom comparison callback to sort,
which would be both slower and more complicated. Your solution is IMHO
the right thing to do here.
> I also have a few specific questions. Instead of:
>
> for name in names.split():
> freq[name] = 1 + freq.get(name, 0)
>
> I might try:
>
> for name in names.split():
> try:
> freq[name] += 1
> except KeyError:
> freq[name] = 1

or a couple other solutions, including a defaultdict (python >= 2.5).
> Which is preferred?

It's a FAQ - or it should be one. Globally: the second one tends to be
faster when there's no exception (ie the key already exists), but slower
when exceptions happen. So it mostly depends on what you expect your
dataset to be.

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!