So what I didn't really talk about last time is that more than just letting you directly express module loading, Exocet also implements:

Parameterized Modules

In Python, classes and functions take parameters, but modules don't. When you load a module, you don't have any opportunity to tell it what you want, or what context it's being loaded in. A lot of times, this matters.

For example, some code has optional dependencies. This idiom can be seen a lot in some parts of Twisted:

try:fromOpenSSLimportSSLexceptImportError:SSL=None

Code following this checks if 'SSL' is None to decide whether to define methods and classes that provide support for SSL connections in Twisted.

Other code can depend on one of multiple providers of an interface. Here's the famous example from the docs for the magnificent lxml library:

try:fromlxmlimportetreeprint("running with lxml.etree")exceptImportError:try:# Python 2.5importxml.etree.cElementTreeasetreeprint("running with cElementTree on Python 2.5+")exceptImportError:try:# Python 2.5importxml.etree.ElementTreeasetreeprint("running with ElementTree on Python 2.5+")exceptImportError:try:# normal cElementTree installimportcElementTreeasetreeprint("running with cElementTree")exceptImportError:try:# normal ElementTree installimportelementtree.ElementTreeasetreeprint("running with ElementTree")exceptImportError:print("Failed to import ElementTree from any known place")

The effect of this code is to try to import, in order, one of:

lxml

cElementTree from the stdlib

ElementTree from the stdlib

ElementTree installed separately

cElementTree installed separately

I don't think it's a stretch to say this is rather silly. How would you feel if you saw a function that tried to access five different global variables in a row in order to decide what to do?

And though all of these modules implement the same interface, they're still different code, and you might hit some edge case where their behaviour differs. How do you test your code using each possible ElementTree implementation? As Glyph points out, it's always sunny in Python, and every piece of bad code can be worked around by writing worse code; you could have your unit tests fool around with sys.modules. But can't there be something better?

Here's a different way of thinking about it entirely.

In languages that aren't as good as Python, dependency injection is a technique that gets used to deal with this. Dependency injection has many forms and can be rather complicated, but the general idea is code declares the thing it needs, and something else (unfortunately, often an XML file!) describes what objects to provide to satisfy those dependencies.

With Exocet, we hijack the import statement to describe named parameters, indicating the dependencies our code has.

fromexocet.parametersimportetree

x=etree.parse(open("mydata.xml"))

Now, if you look in the Exocet tarball, you won't find a parameters.py file. This name doesn't correspond to anything on the filesystem.

So how does this help us? Well, it means you can load your ElementTree-using module like this:

This way, you can test your code without having to resort to sys.modules hackery, and you can better factor your applications by separating configuration and environment concerns from the rest of your code.

As you may recall from last time, pep302Mapper provides the default Python implementation of module loading. In this case, we've just added another name that can be imported, exocet.parameters.etree.

Being able to provide parameters to modules opens up the possibility to eliminate all use of global objects from your code, and pass objects only to the code that needs them. I believe that with experimentation and refinement of these tools, this technique is going to enable a lot of simpler methods for organizing complex applications and reduce a lot of complications people have to deal with in Python now.

What is Exocet?

Exocet is a new way to load Python modules. It separates the act of naming a dependency from the act of creating a module object. As a result, more than one instance of a module can be created from the same source file. Also, when creating a module object, precise control can be exerted over what it's allowed to import.

Quick breakdown of what's going on here: loadNamed takes a module name and a mapper object, and returns a new module object. Note that this is very different from what import does; each invocation of loadNamed returns a new module object, unrelated to any previous ones.

The mapper object is responsible for intercepting all import calls made by the module being loaded. The pep302Mapper object used here just invokes Python's normal importing behaviour, caching loaded modules in sys.modules. But its withOverrides method creates a new mapper, one that looks in a dict first. The urllib_plus module is almost like the urllib module in this example, with one difference: the latter got the real httplib module when it executed the statement import httplib; urllib_plus got the httplib_plus module when it ran that statement. The other mapper that Exocet includes by default is emptyMapper, in which all import statements will fail. You can construct your own, providing any object under any name to be available to import statements in modules you load.

So after constructing these two modules, they're usable as normal. The only difference is that urllib_plus invokes our wrapper function when it calls httplib.HTTP(), producing the printed line before proceeding with its work.

So, how does this behaviour answer the questions I mentioned earlier?

Module reloading

As we saw last time, reload isn't suitable for real-world use; it changes things around too much in some places, not enough in others. With Exocet, you don't have to reload modules because you can just load them. When you want a new module with updated code, you just load it and have a new module object. All the old objects still exist as long as you want them to; there's no orphaned-instance problem. If you need new versions of the module's dependencies you can load them first and put them in the mapper so that they're visible to your new module. Old instances, classes, and modules can be removed in the normal way — by Python's garbage collection, when nothing uses them any longer.

Plugin systems

The normal use of Python packages and modules involves importing specific ones by name. However, sometimes you want to load a bunch of modules and call something in each. Since Exocet borrows code from twisted.python.modules, it's possible to iterate over a package and load each module in it:

>>> importexocet>>> [exocet.load(x,exocet.pep302Mapper)forxin... exocet.getModule("twisted.plugins").iterModules()][<module 'twisted.plugins.cred_anonymous' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/cred_anonymous.py'>, <module 'twisted.plugins.cred_file' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/cred_file.py'>,<module 'twisted.plugins.cred_memory' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/cred_memory.py'>,<module 'twisted.plugins.cred_unix' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/cred_unix.py'>,<module 'twisted.plugins.twisted_conch' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_conch.py'>,<module 'twisted.plugins.twisted_ftp' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_ftp.py'>,<module 'twisted.plugins.twisted_inet' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_inet.py'>,<module 'twisted.plugins.twisted_lore' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_lore.py'>,<module 'twisted.plugins.twisted_mail' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_mail.py'>,<module 'twisted.plugins.twisted_manhole' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_manhole.py'>,<module 'twisted.plugins.twisted_names' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_names.py'>,<module 'twisted.plugins.twisted_news' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_news.py'>,<module 'twisted.plugins.twisted_portforward' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_portforward.py'>,<module 'twisted.plugins.twisted_qtstub' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_qtstub.py'>,<module 'twisted.plugins.twisted_reactors' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_reactors.py'>,<module 'twisted.plugins.twisted_runner' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_runner.py'>,<module 'twisted.plugins.twisted_socks' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_socks.py'>,<module 'twisted.plugins.twisted_telnet' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_telnet.py'>,<module 'twisted.plugins.twisted_trial' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_trial.py'>,<module 'twisted.plugins.twisted_web' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_web.py'>,<module 'twisted.plugins.twisted_web2' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_web2.py'>,<module 'twisted.plugins.twisted_words' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_words.py'>]

In this case we used the default pep302Mapper since we wanted plugin modules to be able to import anything they wanted. However, a different mapper could be provided which provided different modules, limited them to only importing certain ones, or none at all.

Unit testing

Hardly anybody asks about this on #python, but they should. By loading the modules you test with Exocet, you can replace their dependencies with stubs without monkeypatching. Just provide overrides to the mapper object when loading the module, and you have a stubbed or mocked module to test without altering the behaviour of other tests.

Next steps

Exocet lets you load most existing Python modules, but in a way that gives much better control over what happens. This makes room for radically different ways of thinking about how to organize Python programs. Since this approach is so different, practice is needed to come up with new idioms for how to use it effectively. I've tried carefully to provide mechanism here with very little policy. There are a few kinks to still work out in how Exocet works in more complicated scenarios, but give it a try! It might be just the thing for your next IRC bot. ;-)

If you want to help with Exocet development, it happens here, on Launchpad. I look forward to your feedback.

I've spent a lot of time living with Python's module system, both in my ownwork and in helping people on Freenode's #python channel. A lot of Python'spower comes from its module system; however, it could be better. It can behard to think about how modules could be done differently, since it's socentral to the design of Python software, but it's worth the effort. Here'ssome stuff I've been thinking about.

The Good

No global namespace for objects

This is the benefit of module systems in general and it's a big one. It's what I miss the most when writing in C, for example (And until very recently — Javascript!) Being able to treat a source file as an actual container for code instead of an arbitrary pile of functions and variables with no structure makes code much easier to comprehend and navigate.

Python modules are easy to write and to import

Particularly I'm comparing this to Scheme 48 and ML, which both have very well-designed and powerful module systems, but they're rather confusingto the newcomer because they require a good bit of up-front knowledge to construct a module that's useful to anyone else. In Python, you just stick some code in a file and then all the names in it are importable. My earliest Python memory was joining #python and asking "how do I import some code I wrote in one file into another"? I was told 'for a file named foo.py, use "import foo"'. My reaction was "Really? That's all?" Providing a low barrier to entry for creating and using modules is an extremely powerful advantage of Python.

The Bad

Modules are in a global namespace

Although modules contain classes, functions, etc., there's no containment hierarchy for module names themselves. Different modules can have functions and classes with the same name in them, but there's nothing that can contain multiple modules with the same name. This shows up as a problem when you want to write unit tests that use fake versions of some modules, for example. When faking a function or a class, one creates a new version. Modules generally have to be modified rather than replaced, since import looks up modules in the global module namespace.

PYTHONPATH is a rather inflexible way to organize modules

Organizing modules by location in the filesystem is a great way to get started, but it's not the only possible thing one might want. This deficiency has been addressed in recent Pythons via the PEP 302 import hooks. However...

PEP302 hooks help, but aren't enough by themselves

The canonical example of alternate module organization is putting them in a zip file, which Python supports via the standard import hooks now. Now you have extra problems, though. Python packages are a good way to organize modules, but they don't provide a way to enumerate their contents. To work around this, everybody looks at the filesystem layout to determine what's in a package. But if your modules aren't being loaded directly from the filesystem, this approach won't work.

The Really Bad

Modules are singletons (i.e., global mutable state)

This is the dark secret at the heart of any large-scale Python project. One can be very careful about organizing one's state into instances and so forth, but all modules are still visible and modifiable by any code at any time.

Still easy to write unreadable code via monkey-patching

It's easy and convenient to assign to module attributes any time you feel like it. The result is that any time you see "from foo import someObject", you can't every be sure about where that object was defined unless you read all the source code in the application. Even when it's desirable to change module contents (such as for tests), it's easy to fail to do so in a way that doesn't introduce dependencies or conflicts between tests. The classic example is calling some function that initializes module globals from a config file; if one test does it, it can cause tests run after it to fail or incorrectly succeed.

reload()

The reload function is a symptom of all the above problems. Its inspiration is obvious: loading code that's changed since the current Python process has started is an entirely sensible idea. However, Python's assumptions about how modules work makes this rather difficult to do in a sensible manner. It's common to create new lists rather than modify old ones when a new version of some data is wanted. This convention is reinforced by the ease by which list comprehensions can be used to do this job. The convention encouraged by the existence of reload is exactly opposite, though — instead of creating a new module object, the old one is emptied and refilled with fresh objects. The result is that instances of classes in that module are orphaned; the class they were instantiated from can't be reached by its name. Also, it only reloads a single module; no help is provided in updating modules that depend on it, or updating its own dependencies. Figuring out which modules to reload or not reload at any given time is often very tricky. Plenty of other corner cases exist, such as reinitialization of function default arguments, and so forth. Because of all this, the standard advice on #python is that "reload will not make you happy".

What Now?

So with these problems identified in how Python handles modules, can anything be done?Well, that's why I wrote Exocet. More about that next time.