Monthly archive for September, 2013

As part of a language, Python obviously has import statements. They allow us to divide the code into different modules and packages:

importcollections

importos

import thirdpartylib

import anotherlib

from myapp.utilimport do_stuff

What is lesser known fact is that it also has an __import__ function. This function retains all functionality of the import statement, but has some additional features and slightly different use cases. With it, for example, you can import a module whose name you only know at “runtime”:

def import_plugin(name):

plugins_package =__import__('myapp.plugins', fromlist=[name])

returngetattr(plugins_package, name)

This comes handy in various types of general dispatchers, factory functions, plugin systems, and so forth. Returned from __import__ function is always a module object (even in cases when fromlist argument is used), so often a getattr is needed to extract a specific symbol from it.

Quite surprisingly, I have discovered that __import__ function may very well be useful also when you do know the desired module name. Reason is that import statement is sometimes unwieldy. It has similar problem as global variables (i.e. global statements) and inner function definitions (as opposed to lambdas): it makes the code stretch unnecessarily in the vertical dimension.

Just for one thing

This can be considered a waste if you only need to access one specific thing from one specific module. Using __import__ function, you can golf the import and the usage into a single statement. Here’s an example, coming straight from my own project recursely:

__import__('recursely').install()

Incidentally, the other uses of literal __import__ described can be conveniently replaced thanks to that small library :)

Will it lint?

Another issue with import statement is that it introduces symbols into the global (or local) namespace. Most of the time, this is precisely what we want. Occasionally, though, a sole fact of loading the module is enough.

introduces an unused symbol – here, it is handlers. Many linting tools will be eager to point this fact out, which is not really that helpful. There is sometimes an option to disable the warning on per line basis, but some checkers (e.g. pep8.py) don’t offer this functionality.
Universal solution? Use __import__ function, of course:

__import__('handlers')

The module is still loaded just fine, but since we’re ignoring the return value, no stray variables are created. As a added bonus, the __import__ call also looks very different, signifying its special purpose.

Staying in place

Actually, there is one more benefit of this trick, also coming from fooling-the-tools department. Many Python IDEs, like Eclipse/Pydev, are able to automatically insert necessary imports and organize them in groups, effectively providing a neat, Java-like experience. What is not so neat is that they often insist on putting every import statement somewhere near the beginning of the file. preceding any other definition, variable, class or function.

In a scenario described above, this behavior may actually cause problems. When the handlers’ module gets imported, it may need to refer back to the application object; this is exactly the case in the Flask framework, for example. If that object happens to be defined in the module importing handlers, we’ll have a circular import error because the application object has not yet been defined. It would have been defined, however, if the statement:

import handlers

hasn’t been touched by the IDE when it wanted to be helpful and organize our imports. All imports, as it turns out.

Fortunately, mechanisms like that tend to be easy to fool. Per answers to StackOverflow question I’ve once asked, it is a matter breaking the textual pattern that the algorithm searches our code for. As you may have guessed by now, one of the ways of achieving that goal is to shed the import statement in favor of __import__ function.

Saying “API” nowadays is thought first and foremost to refer to a collection of HTTP request handlers that expose data from some Web service in a machine-readable format (usually JSON). This is not the meaning of API I have in mind right here, though. The classic one talks about any conglomerate of programming language constructs – functions, classes, packages – that are available for us to use.

You can interact with an API in numerous different ways, but one of them is somewhat less common. Occasionally, besides having functions to call and objects to create, you are also presented with base classes to inherit. This is symptomatic to more complex libraries, written mostly in statically typed languages. Among them, the one that takes the most advantage of this technique is probably Java.

Note that what I’m talking about here is substantially different from implementing an interface. That is a relatively common occurrence, required when working with listeners and callbacks:

as well as in many other situations and patterns. There is nothing terribly special about it, given that even non-object oriented languages have equivalent mechanisms of accomplishing the same objective: separating the “how” from “what” in code, possibly to exchange or expand the latter.

Inheriting from a base class is something else entirely. The often criticized, ideologically skewed interpretation of OOP would claim that inheritance is meant to introduce more specialized kinds of existing types of objects. In practice, that’s almost completely missing the real point.

What’s important in creating a derived class is that:

like with interface, it needs to override certain methods and fill them with code

but also, it may do so using a unique, additional API

Combined, these two qualities allow to introduce much more sophisticated ways of communication between the API and its client code. It’s a powerful tool and should be used sparingly, but certain types of libraries and (especially) frameworks can benefit greatly from it.

As an example, look at the Guice framework for dependency injection. If you use it in your project, you need to configure it by specifying bindings: mapping between the interfaces used by your code and their implementations. This is necessary for the framework can wire them together, possibly in more then one way (different in production code and in test code, for example).
Bindings are quite sophisticated constructs. For non-trivial applications – and these are the ones you generally want to apply dependency injection to – they cannot really be pinned down to a single function call. Other, similar frameworks would therefore use completely external configuration files (usually in XML), which has a lot of downsides.

Guice, however, has a sufficiently smart API that allows to realize all configuration in actual, compilable code. To achieve this, it uses inheritance, exactly as described above. Here’s a short sample:

publicclass BillingModule extends AbstractModule {

@Override

protectedvoid configure(){

bind(TransactionLog.class).to(DatabaseTransactionLog.class);

bind(CreditCardProcessor.class).to(PaypalCreditCardProcessor.class);

}

}

Overridden methods in the above class (well, one method) serve as “sections” in the “configuration file”. Inherited methods, on the other hand, provide an internal, limited namespace that can be used to compose configuration “entries”. Since everything is real code, we can have the compiler check everything for some basic sanity as well.

Among the many things that Python does right, there are also a few that could have been better thought out. No language is perfect, of course, but since Python is increasingly adopted as the language of choice for complete beginners in programming, the net effect will be many new coders being heavily influenced by ideas they encounter here.

And unfortunately, some of those ideas are dead wrong. A chief example is the relation between lists and tuples, specifically the mistaken notion of their mutual similarity. In reality – that is, in every other language – they are completely different concepts, and rightfully so.

What is list?…

The term ‘list’ in programming is sometimes meant to refer to ‘linked list’ – one of the simple data structures you’d typically encounter in the introductory course to algorithms. It consists of a sequence of separately allocated elements, stitched together using pointers. Depending on the density and direction of those pointers, the type is further subdivided into singly and doubly linked lists, cyclic lists, and so on.

But the broader meaning, which is also more common now, describes a general collection of elements that can be linearly traversed, regardless of the details of its underlying representation. The List interface in Java or .NET is intended to capture the idea, and it does so pretty accurately. Concrete implementations (like ArrayList or LinkedList) can still be decided between if those details turn out to be important, but most of the code can deal with lists in quite abstract, storage-agnostic manner.

…in Python

In Python, lists generally conform to the above description. Here, you cannot really decide how they are stored under the hood (array module notwithstanding), but the interface is what you would expect:

files = get_file_list('/home/xion')

if files[0]=='.':

files.pop(0)

if files[0]=='..':

files.pop(0)

print"Found ",len(files)," files."

forfilein files:

print"- ",file

You can insert, append and remove elements, as well as iterate over the list and access elements by indexes. There is also a handy bracket notation for list literals in the code:

a_list =[1,2,3]

About the only weird thing is the fact that you can freely put objects of different types into the same list:

diverse_list =[42,"foo",False]

In a way, though, this appears to be in sync with the general free-form nature of Python. For analogy, think about List<Object> in Java that can do exactly the same (or, in fact, any List prior to introduction of generics).

On the other hand, this is a tuple:

a_tuple =(1,2,3)

Besides brackets giving way to parentheses, there is no difference in literal notation. Similarly, you can iterate over the tuple just as well as you can over the list:

for x in a_tuple:

print x

in addition to accessing elements by index. What you cannot do, though, is modifying a tuple once it is created: be it by trying to add more elements, or changing existing ones:

>>> a_tuple[0]=42

Traceback (most recent call last):

File "<stdin>", line 1,in<module>

TypeError: 'tuple'object does not support item assignment

But just like with lists, there is no limitation for the types of elements you put therein:

diverse_tuple =(42,"foo",False)

All in all, a tuple in Python behaves just like an immutable (unchangeable) list and can be treated as such for all intents and purposes.

Tuplication

Now, let’s look at tuples in some other programming languages. Those that support them one way or another include C++, D, Haskell, Rust, Scala, and possibly few more exotic ones. There is also a nascent support for tuple-like constructs in Go, but it’s limited to returning multiple results from functions.

In any of these, tuple means several objects grouped together for a particular purpose. Unlike a list, it’s not a collection. That’s because:

objects retain their individual identity

their number is not arbitrary; in fact, tuples of different count have separate names: pairs, triples, etc.

objects have positions assigned: when handling a tuple, it is expected that certain thing X will be at position N

Statically typed languages expand especially on the last point, making it possible to specify what type we expect to see at every position. Together, they constitute the tuple type:

In other words, you don’t operate on “tuples” in general. You use instances of specific tuple types – like a pair of int and string – to bind several distinct values together.
Then, as your code evolves, you may need to have more of them, or to manipulate them in more sophisticated ways. To accommodate, you may either expand the tuple type, or increase readability at the (slight) cost of verbosity and transform it into a structure:

struct numbered_line {

string filename;

int number;

string text;

};

Once here, you can stay with functions operating on those structures, or refactor those into methods, or use some combination of those two approaches.

Missing link

But in Python, that link between tuples and structures is almost completely lost. There is one built-in tuple type, which is simply tuple, but even that isn’t really paid attention to. What happens instead is that tuples are lumped together with lists and other collections into the umbrella term of “iterable”, i.e. something which can iterated over (typically via for loop).

As a result, a tuple can be converted to list and list can be converted to tuple. Both operations make no sense whatsoever. But they are possible, and that leads to tuples and lists being used interchangeably, making it harder to identify crucial pieces of data that drive your logic. And so they will grow uncontrollably, rather than having been harnessed at appropriate time into an object, or a namedtuple at the very least.

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.