Chris Leary

One of my friends at work was fakeplaining [*] that he had been on the Python
programming mailing list at work for some time, yet still did not know Python.
Being hopelessly suggestible in the face of obvious sarcasm, I decided to
sacrifice a few hours of sleep to the god of blog. [†]

Note that this entry is aimed at people who already know how to program and
have been looking for a tidbit to try in Python.

There are a lot of side notes I've left out for simplicity of explanation;
however, I also attempted to make the experience interesting by introducing one
of Python's more advanced features, called "generator functions," into the mix.
Hopefully it strikes a balance. Please comment if you are utterly confused by
generators — I may add an alternate section that allows the reader to avoid
them altogether.

You kata wanna...

A number of leaders in the programming community are hot on this trend called
"code katas." I'm actually a big fan of this trend, mainly because I've
been writing code for no reason, hating on it, throwing it away, and
subsequently rewriting it for my entire life, but I now get to call it
something cool- and ninja-sounding. Doing such things in my spare time is no
longer considered "inexcusable nerdiness;" rather, it's my small endeavor to
bring professionalism to the field of software engineering. *cough*

One reason that I really enjoy this new trend is that programmers are posting
their own morsel-sized programming problems left and right, giving ample
opportunities to explore new languages (and dusty corners of ones you know
well) without accidentally becoming BDFL of a seminal framework or utility.
[‡]

RC4 Pseudocode

Case in point, I'll use the recent kata from Programming Praxis for this
Python exercise, as they provide straightforward pseudocode. Here's the
encryption algorithm named RC4, as quoted from Programming Praxis:

The simplicity of the translation demonstrates why Python is sometimes called
"executable pseudocode". Breaking it down line by line:

defines a function named initialize that takes a single argument,
key.

A documentation string ("docstring" for short). In Python, documentation is
associated with a function even at runtime, in contrast to traditional
JavaDoc or POD. [§] If the first statement in a function is a string
literal, it is used as the docstring for that function. [¶]

The built-in range function returns a list of values. [#] "Built-in"
is the terminology used for items that are "available all the time without
explicitly importing anything."

This function also has a two-argument form, range(start, stop);
however, in the single argument form, start has a default of 0, so the
function invocation returns a list of all the integers in the mathematical
interval [0, 256), for a total of 256 values.

There is only one for loop syntax: for [identifier] in [iterable].
Lists are iterable because they contain a sequence of objects.

Finite collections also support the built-in function len([sizable]).
The way that numerical arithmetic works and sequence indexing via
seq[idx] should be familiar.

Python has an elegant swap capability — what's important to note is
that the entire right hand side is evaluated, then assigned to the left
hand side.

Python functions optionally return a value. If no return statement is
encountered, None is returned, which indicates the absence of a value
(docs).

Generators: functions that pause

Python has a convenient feature, called "generator functions," that allows you
to create a stream of values using function-definition syntax. [♠] You can
think of generator functions as special functions that can pause and resume,
returning a value each time it pauses.

The canonical example illustrates the concept well — use the interactive
Python shell to explore how generator functions work, by running the python
command without arguments. Make sure the version is python2.3 or above.
Once you're in the interactive shell, type the following:

>>>defgen_counter():
... i =0...whileTrue:
...yield i
... i +=1...>>>

Note the use of a yield statement, which tells Python that it is a
generator function. Calling a generator function creates an iterable generator
object, which can then produce a potentially infinite series of values:

Applying generators to RC4

This dove-tails nicely with the second part of the algorithm, which requires a
stream of values to XOR against. The generator is nearly a direct translation
from the pseudocode, which you may also add to rc4.py:

Each time .next() is called on the generator instance, the function
executes until the first yield statement is encountered, produces that
value, and saves the function state for later.

Yes, we could create a big list of pseudo-random values the length of the
text, but creating them all at the same time adds O(len(text)) memory
overhead, whereas the generator is constant memory overhead (and
computationally efficient).

Tying it together

Now we just need a function that does the XORing, which teaches us about
strings and characters.

The generator object is instantiated by calling the generator function.

As you can see from the for loop, Python strings are iterable as
sequences of characters. Characters in Python are just strings of length
one, so you can think of a string iterator as stepping over all of its
one-character substrings in order.

To convert a textual character into its character-code numerical value, the
built-in ord function is used (docs).

The meat of the algorithm: XOR the textual character with the next
pseudo-random byte from the byte stream.

After obtaining the cipher-byte through the XOR, we want to convert back to
a textual (character) representation, which we do via the built-in chr
function (docs). We then place that character into a sequence of cipher
characters. [♥]

To join together characters to form a string, we use the
str.join([iterable]) method (docs). [♦] Note that, on some
platforms, this is much more efficient than using += (for string
concatenation) over and over again. It's a best practice to use this
sequence-joining idiom to avoid possible concatenation overhead. [♣]

Front-end fun

If you thought that the pseudo-code translation looked like a piece of cake,
you may feel up to a challenge: write a command line interface that:

Asks for an encryption key.

Turns the key to a sequence of integer values and initializes with it.

Continually asks for user-provided text to translate and spits out the
corresponding cipher text.

What you need to know

In Python 2.x print is a statement that is followed by comma-separated
values, where each comma turns into a space. The print statement puts a
newline at the end of its output by default:

>>>print'a', 'b', 'c', 'd'
a b c d

To suppress the newline (for example, in a loop), leave on a trailing
comma:

>>>for char in ['a', 'b', 'c', 'd']:
...print char,
...
a b c d

If there's something that you can't do using the above, I'll refer you to
the Python tutorial's section on fancier output formatting.

The built-in function called raw_input (docs) displays a message
and then requests user input as a command line, returning the user input as
a string. For example:

name =raw_input('What is your name? ')
print'Your name is', name

The built-in function called repr (docs) returns the Python
representation of a string (or any object) — this is useful for escaping
strings with funky, non- printable characters in it, as our cipher
algorithm is likely to do. For example, you'll probably want to use
something along the lines of:

cipher_text = run_rc4(k, text)
print'Cipher text:', repr(cipher_text)

The character-escaping performed by repr can be reversed using the
string decode method: str.decode('string_escape'). For example:

So if you want to allow the user to enter ciphertext at the command prompt,
you can read it in and decode it from the escaped format.

If you just put the CLI front-end code to execute at the bottom of the
rc4.py file, it will work; however, it's a best practice to test to
make sure that rc4.py is the file that's being executed. To do this,
wrap the command line interface code in a test like the following:

This allows you to reflect on things and extract their documentation, which
comes in handy when you're running in an interactive Python session or
spitting out module-level documentation in a command line argument parser.

There's a way to use generators here as well, but the list of characters
makes things simpler to understand for the moment. If you're feeling
confident, convert this function to be a generator function at the end of
the exercise and make it work with the rest of the program.

I've been doing a bunch of work in JavaScript lately using Dojo's
asynchronous I/O object implementation, dojo.Deferred, based off Twisted
Python's deferred model. Deferreds represent a piece of asynchronous I/O (i.e.
data retrieval from a web service) and a chain of callbacks that
should be performed when it completes.

Deferreds are an excellent way to keep your head from exploding when dealing
with callback-oriented APIs, especially when you need to coordinate the point
at which several asynchronous operations occur.

The above uses an obvious naming convention to differentiate between
synchronous and asynchronous getters — it's important to know when a method
will be returning a deferred as opposed to the desired value itself!

There is, however, a sneaky issue with RemoteAnimal.prototype.fetchChildren
-- can you tell what it is? You have to think pretty hard about what kinds of
execution asynchronous I/O allows from the perspective of a caller.

Caveat

The key is to realize that fetchChildren could be called a whole lot of
times in a row on a really slow connection (or if the caller is calling it in a
for loop, which is more likely to be a problem). This will create n
outstanding asyncFinds, triggering n asynchronous XMLHttpRequests,
where we really only needed one. To fix this, you need to patch your code so
that only one request could be outstanding at any given time, like so:

The annoying part is that you can't just tack the runner-up deferred triggers
onto the original as callbacks. (That would make the construction so much
easier!) Between the first and any later invocation of
fetchChildren;, there may be callbacks added to the original
deferred that would arbitrarily change the outcome. As a result, we have to
call the other deferreds back directly with the found children from the
original fetch.

It's possible to factor out this runner-up-aggregating behavior in a highly
reusable way — this code was meant to demonstrate the caveat and how to fix
it. Encapsulating the runner-up triggering behavior is left as an exercise to
the reader. :-)

Non-Obvious Operations with Deferreds

For educational purposes I've enumerated a few non-obvious operations that you
can perform using deferreds. [†]

The key to understanding this example is that returning a deferred from a
callback chains that deferred in. In other words, if you return a deferred
from a callback, all that deferred's callbacks will be executed before you
come back to the original. That way, we can perform recursion without worrying
whether or not somebody has added more callbacks onto the initial
megaDeferred returned from the getSubtree function — the chained
callbacks execute right away, before any callbacks added by the caller.

Footnotes

Note that dojo.DeferredList.gatherResults is actually mapped as
dojo.DeferredList.prototype.gatherResults in my version of Dojo (1.3.1)
-- not sure what's up with that. It's certainly a factory function, so I've
mapped it onto the dojo.DeferredList constructor as well.

I highly appreciate the presents that the Python 3.1 team (unwittingly) got me
for my birthday this year. This morning I wrote the following snippet to
determine the day-frequency of birthday occurrences: [*]

What is magic?

Characteristic of something that works although no one really understands
why (this is especially called black magic).

Taken in the context of programming, magic refers to code that works without a
straightforward way of determining why it works.

Today's more flexible languages provide the programmer with a significant
amount of power at runtime, making the barrier to "accidental magic" much
lower. As a programmer who works with dynamic languages, there's an important
responsibility to keep in mind: err on the side of caution with the Principle
of Least Surprise.

[T]o design usable interfaces, it's best when possible not to design an
entire new interface model. Novelty is a barrier to entry; it puts a
learning burden on the user, so minimize it.

This principle indicates that using well known design patterns and language
idioms is a "best practice" in library design. When you follow that
guideline, people will already have an understanding of the interface that
you're providing; therefore, they will have one less thing to worry about in
leveraging your library to write their code.

Discovery Mechanism Proposals

Patrick is solving a common category of problem: he wants to allow clients to
flexibly extend his parsing library's capabilities. For example, if his
module knows how to parse xml and yaml files out of the box,
programmers using his library should be able to add their own rst and
html parser capabilities with ease.

Patrick's proposal is this:

Have the programmer place all extension modules that might contain
parser classes in a known directory.

In a factory class constructor, take a directory listing of the known
directory.

Import every module present in that listing.

Inspect each module imported this way for class members.

For each class found, add it to an accumulator if it inherits from a
Parser abstract base class provided by the module.

If you were to do this, you would use the various utilities in the imp
module to load the modules dynamically, then determine the appropriate
classes via the inspect module. [‡]

My counter-proposal is this, which is also known as the Registry
Pattern, a form of runtime configuration and behavior extension:

Have the programmer import a decorator from our module.

Let them decorate any class [§] that conforms to the implicit Parser
interface.

Drag and drop your Python code into my directory — I'll take care of it
from there.

That's right, that's all there is to it.

Oh, I know what you're thinking — yes, I'm available — check out
parser_lib.PHONE_NUMBER and give me a call sometime.

But, as you envision phone calls from sexy Pythonistas, the left hemisphere of
your brain is screaming at the top of its lungs! [#]

Magic leaves the audience wondering how the trick is done, and the analytical
side of the programmer mind hates that. It implies that there's a non-trivial
abstraction somewhere that does reasonably complex things, but it's unclear
where it can be found or how to leverage it differently.

Coders need control and understanding of their code and, by extension, as much
control and understanding over third party code as is reasonably possible.
Because of this, concise, loosely coupled, and extensible abstractions are
always preferred to the imposition of elaborate usage design ideas on
clients of your code. It's best to assume that people will want to leverage the
functionality your code provides, but that you can't foresee the use cases.

To Reiterate: Dynamic does not Imply Magical

Revisiting my opening point: anecdotal evidence suggests that some members of
the static typing camp see we programming-dynamism dynamos as anarchic
lovers of programming chaos. Shoot-from-the-hip cowboys, strolling into
lawless towns of code, type checking blowing by the vacant sheriff's station as
tumbleweeds in the wind. (Enough imagery for you?) With this outlook, it's easy
to see why you would start doing all sorts of fancy things when you cross into
dynamism town — little do you know, we don't take kindly to that 'round these
parts.

In other, more intelligble words, this is a serious misconception — dynamism
isn't a free pass to disregard the Principle of Least Surprise — dynamism
proponents still want order in the programming universe. Perhaps we value our
sanity even more! The key insight is that programming dynamism does allow
you additional flexibility when it's required or practical to use. More
rigid execution models require you to use workarounds, laboriously at times,
for a similar degree of flexibility.

Caveat

It's possible that Patrick was developing a closed-system application (e.g.
the Eclipse IDE) and not a library like I was assuming.

In the application case, extensions are typically discovered (though not
necessarily activated) by enumerating a directory. When the user activates such
an extension, the modules found within it are loaded into the application.
This is the commonly found plugin model — it's typically more difficult to
wrap the application interface and do configurations at load time, so the
application developer must provide an extension hook.

However, the registration pattern should still be preferred to reflection in
this case! When the extension is activated and the extension modules load, the
registration decorator will be executed along with all the other top-level code
in the extension modules.

The extension has the capability to inform the application of the extension's
functionality instead having the application query the plugin for its
capabilities. This is a form of loosely coupled cooperative configuration
that eases the burden on the application and eliminates the requirement to
foresee needs of the extensions. [♠]

As of the date of this publishing, Patrick's implementation seems to have
gone a bit astray with text processing of Python source files. Prefer
dynamic module loading and inspection to text processing source code!
Enumerating the reasons this is preferred is beyond the scope of this
article.

Of course, the plugin model always has security implications. Unless you go
out of your way to make a sandboxed Python environment for plugins, you
need to trust the plugins that you activate — they have the ability to
execute arbitrary code.

I queue up a few thousand things to do before I get on an airplane: synchronize
two-thousand Google Reader entries, load up a bunch of websites I've been
meaning to read, and make sure for-fun projects are pulled from their most
updated branches.

Polymorphism Recap

The word "polymorphic" comes from Greek roots meaning "many shaped." (Or they
lied to me in school — one of those.) From a worldly perspective I can see
this meaning two things:

A single object can take on many shapes, or

Requirements for a general "shape" can be satisfied by different categories
of objects.

As it turns out, both of these concepts apply to the Object-Oriented
programming, but the canonical meaning is the latter. [*] As Yegge
says:

If you have a bunch of similar objects [...], and they're all supposed to
respond differently to some situation, then you add a virtual method to them
and implement it differently for each object.

(If you don't know what a virtual method is, the Wikipedia page has an
alternate explanation.)

Yegge's Example

Yegge demonstrates that strictly adhering to the principles of polymorphism
does not always produce the best design:

Let's say you've got a big installed base of monsters. [...] Now let's say one
of your users wants to come in and write a little OpinionatedElf monster. [...]
Let's say the OpinionatedElf's sole purpose in life is to proclaim whether it
likes other monsters or not. It sits on your shoulder, and whenever you run
into, say, an Orc, it screams bloodthirstily: "I hate Orcs!!! Aaaaaargh!!!"
(This, incidentally, is how I feel about C++.)

The polymorphic approach to this problem is simple: go through every one of
your 150 monsters and add a doesMrOpinionatedElfHateYou() method.

This is a great counterexample — it induces an instant recognition of absurdity.

He then touches on the fact that dynamic languages allow you to do neat things
consistent with polymorphism due to the flexibility of the object structure
(which is typically just a hash map from identifiers to arbitrary object
values):

I guess if you could somehow enumerate all the classes in the system, and
check if they derive from Monster, then you could do this whole thing in a few
lines of code. In Ruby, I bet you can... but only for the already-loaded
classes. It doesn't work for classes still sitting on disk! You could solve
that, but then there's the network...

This is clearly impractical, but I figured there was some exploratory value to
implementing this challenge in Python. This entry is a small walk-through for
code to detect interface conformity by inspection, enumerate the classes in the
environment, manipulate classes in place, and add an import hook to manipulate
classes loaded from future modules.

Determining which Classes are Monsters

First of all, Python doesn't require (nor does it encourage) a rigid type
hierarchy. Python's all about the interfaces, which are often implicit.
Step one is to create a way to recognize classes that implement the
monster interface:

Enumerating the Classes in the Environment

All of the modules that have been loaded into the Python environment are placed
into sys.modules. By inspecting each of these modules, we can
manipulate the classes contained inside if they conform to our monster
interface.

for name, module in sys.modules.iteritems():
extend_monsters(module)

The extend_monsters function is a bit nuanced because immutable modules
also live in sys.modules. We skip those, along with abstract base classes,
which have trouble with inspect.getmembers():

If we were going to be thorough, we would recurse on the members of the class
to see if the class scope was enclosing any more IMonster classes, but
you're never really going to find them all: if a module defines a monster class
in a function-local scope, there's no good way to get the local class statement
and modify it through inspection.

In any case, we're at the point where we can modify all monsters in the
top-level namespace of already-loaded modules. What about modules that we have
yet to load?

Post-import Hook

There is no standard post-import hook (that I know of) in Python. PEP 369
looks promising, but I couldn't find any record of additional work being done
on it. The current import hooks, described in PEP 302, are all pre-import
hooks. As such, you have to decorate the __import__ builtin, wrapping the
original with your intended post-input functionality, like so: [†]

The Network

Yegge brings up the issue of dynamically generated classes by mentioning
network communications, calling to mind examples such as Java's RMI and
CORBA. This is a scary place to go, even just conceptualizing. If
metaclasses are used, I don't see any difficulty in decorating __new__ with
the same kind of inspection we employed above; however, code generation
presents potentially insurmountable problems.

Decorating the eval family of functions to modify new classes created
seems possible, but it would be challenging and requires additional
research on my part. exec is a keyword/statement, which I would think is a
hopeless cause.