Tuesday, June 30, 2009

A Ban on Imports

This is not, of course, an essay on restricting free trade. Rather, this post is about the evils of the import clause, which occurs in one form or another across a wide array of programming languages, from Ada, Oberon and the various Modulas, through Java and C#, and on to F# and Haskell.

The import clause should be banned because it undermines modularity in a deep and insidious way. This is a point I’ve attempted to convey time and time again, with only limited success. I will now try to illustrate the problem via a hardware inspired example.

Consider the not-so-humble MP3 player. An MP3 player is a hardware module. The market is full of them, as well as other hardware modules they can plug in to. For example, sound systems where on can dock an MP3 player and have it play on stereo speakers.

Let’s try and describe the analog of such a sound system using programming language modularity constructs that rely on imports:

module SoundSystem

import MP3Player; ... wonderful functionality elided ...end

I want to describe how my sound system works, separately from the description of how an MP3 player works. I would like to later plug in a particular MP3 player, say a Zune(tm) or an iPod(tm)

tm: Zune and iPod are trademarks of Microsoft and Apple respectively, two companies with armies of lawyers who might harass me if I do not state the obvious.

Now the first problem is that neither Zune or iPod are named MP3Player. If I want to connect my sound system to a Zune, I will have to edit the definition of SoundSystem to name the specific module I want to import.

If you’re very petty, you might say that Zune and iPod do not share a common interface and cannot be docked into the same sound system. Imagine that we wish to use our sound system with an iPhone (tm) and an iPod Touch (tm) of some compatible generation.

tm: iPhone and iPod Touch are trademarks of Apple.

Say I decide to go with a Zune.

module SoundSystem

import Zune; ... wonderful functionality elided ...end

Later I change my mind for some reason, and want to hook up my system to an iPod. It’s easy: I just edit the definition of my system again, to import iPod:

The question you should be asking is: Why should I edit the definition of my system each time I change the configuration? In reality, it is unlikely that I am actually the designer of SoundSystem. I probably don’t even have access to its definition. I just want to configure it to work with my MP3 player.

The problem is that import confounds module definition and module configuration. Module definition describes the design of a module; module configuration describes how one hooks up different modules. The former has to do with module internals; the latter should be done externally to the modules involved, to allow them to be used in any context where they could function.

We clearly want our sound system to abstract over the specific player being plugged in to it. Any player with a compatible interface will do. A well known mechanism for abstracting things is parameterization. We might be happier if we defined our sound system parametrically with respect to the MP3 player

without having to modify (or even have access to) the source code for the definition of SoundSystem. Hurray!

The module definition looks a lot like a function, and the configuration code looks like a function application. This is very suggestive. Indeed, ML introduced a module system based on function-like things called functors a quarter century ago. But there’s a bit more to this.These hardware pieces tend to plug in to each other. For example, the definition of IPod is parametric too:

IPod(dockingStation){... even greater wonders ...}

Our sound system does its thing by behaving like a docking station. It and the MP3 player are mutually recursive modules. Configuration therefore requires support for mutual recursion (which is not allowed in Standard ML):

If this notation is unfamiliar, please brush up on your functional programming skills before you become unemployable. Basically, ignore the first and last line, and treat the two lines involving = as equations.

So the module definitions are a lot like functions that yield modules. You could also think of module definitions as classes yielding instances. The instances are like physical hardware modules.

Now we can add another sound system and use our old Zune

letrec {dock2 = SoundSystem(oldMP3);oldMP3 = Zune(dock2);} in dock2;

This is what is often called side by side deployment - multiple instances of the same design, configured differently.

Tangent: Yes, Virginia, you can achieve that sort of thing in Java despite imports, using class loaders. Imports hardwire names into your code, and class loaders can counteract that by letting you define multiple namespaces. These can have multiple copies of your code, potentially hardwired to different things (even though they all have the same name). If you think class loaders offer a simple, clean way of doing things that is easy to learn, use, understand and debug, this post is not for you. Nor will any amount of OSGi magic on top fundamentally change things.

We might also choose to define things differently

module SoundSystem(MP3Player) { player = MP3Player(self); ... }

Here, we are passing module definitions as parameters. We are also referring to SoundSystem’s current instance from within itself - a lot like classes, no? We might configure things thusly

SoundSystem(iPod);

So it looks like mutual recursion and first class module definitions are very natural things to have. And yet traditional languages do not support this - even though many languages have constructs like classes and functions that are first class values and can be defined in a mutually recursive fashion.

One problem with using these constructs to define modules is that they are usually able to access anything in the global namespace. This makes it very hard to avoid implicit dependencies.

Interestingly, the global namespace is exactly what import requires. Since we don’t need or want import, let’s do away with it and the global namespace. We clearly will get a much more modular system without it; but wait - there seems to be one place where we really want the global namespace. That is when we write our configuration code, the code that wires our modules together.

That’s fine - there are a number of solutions for that. It isn’t always clear that our configuration language is the same language as the programming language(s) that define our modules, for example. If you write a makefile, the global namespace is defined by your file system and accessible within the makefile. Not that I really want to recommend make and its ilk.

I think we do want to code our configuration in a nice general purpose high level programming language. One solution is to have our IDE provide us with an object representing the known global namespace, and write our configuration code with respect to that namespace object. This is essentially what we do in Newspeak.

In the next post, I’ll discuss more of the advantages of this approach, contrast how Newspeak handles things with other languages with powerful module systems, like Scheme (which for the past decade or so has had a system called Units that is quite close to what I’ve discussed so far) and ML, and show once more how one actually does configuration in Newspeak.

28 comments:

As a blinkered Java programmer, I expect your SoundSystem module to depend on an abstract MP3Player interface (and a symptom of my confusion is that I'm using module and interface interchangeably). Then I expect that concrete implementations of MP3Player will exist, and configuration will happen outside the SoundSystem and MP3Player modules:

1) What if your module provides classes that need to be instantiated (as is usually the case)? You cannot instantiate interfaces, precisely because they do not determine an implementation. Hence the use of dependency injection frameworks in the Javasphere. With real modularity, they aren't needed.

2) Mutual recursion is still awkward.

In other words, your approach breaks down once you get past the top level.

Maybe I'll try another tack. Suppose your modules are Java packages that never, ever import classes (or statics etc.) - only interfaces. To configure an application you define a main package and import the various pieces you need, and tie them together. This is what you are suggesting - clearly not normal practice. Nor is it feasible with existing libraries.

Suppose further that your language is dynamically typed. No need to import interfaces. So the only use of import is in your "configuration package". This is the one place where you need a "global" namespace. As the following post will explain, we leave this configuration to tools. Hence no import.

This word import. I do not think it means what you think it means... at least in Java. It is nothing more than an abbreviation and has nothing to do with dependencies.

In Java there are several mechanisms to solve your real problem which is that SoundSystem depends on concrete implementations, none of them have anything to do with the 'import' keyword though. Probably the most Java native solution is using generic interfaces and classes. The others are tacked on solutions.

The meaning of import in Java is given by the Java Language Specification, of which I am an author. So I probably do know what it means.

Your point, presumably, is that in Java one can always access things via fully qualified names. Ergo, there is a global namespace. My point, in turn, is that import injects a global namespace into modules - even if it is defined "properly", as in Modula dialects.

The only real need use for a global namespace is when configuring modules, and this should be done externally, and via tools.

Hence no import. And, yes, of course, no global namespace and no fully qualified names.

To a degree, the idea your proposing here is like "dependency injection". Ihab Awad from Google Caja and I have designed a system that permits module systems to be instantiated with particular modules using particular names in the module name space or in a "system" free variable name space available in Server Side JavaScript modules, particularly with Tom Robinson's Narwhal. I invite you to drop in for a program some time.

"Suppose your modules are Java packages that never, ever import classes (or statics etc.) - only interfaces. To configure an application you define a main package and import the various pieces you need, and tie them together. This is what you are suggesting - clearly not normal practice. Nor is it feasible with existing libraries."

I disagree, this is very normal practice. It is exactly the approach taken when one is using a dependency injection framework, or OSGi services.

You said that DI isn't needed when using real modularity. Aren't you just embedding the concepts of DI into your language and VM?

Clean modularity constructs that subsume DI are neither trivial (as you say "just") nor are they derived from DI.

DI is a very convoluted way of working around basic flaws in the programming language.

It may be your only option using a mainstream language. The point is one can do better.

As for whether this is normal practice:A. I wasn't referring to DI. DI requires a lot of extra machinery that wasn't implied in the discussion in question.

B. DI isn't universal practice for good reason. There are vast amounts of code out there that do not use these mechanisms - partly because they are so complex. Can you imagine teaching DI or OSGi in an first year programming class?

The old post about DI, and several others on this blog, attempt to show better ways of dealing with the issue of modularity. I keep at it precisely because it seems so hard for people to grasp.

It's clear that the real problem you are discussing here isn't imports but global resources. Removing the import keyword from Haskell and Java would just make the programs more verbose not more modular.

Perhaps you're arguing that import is an "enabler" - it make name spacing practical and name spacing gives a language designer and a language user an illusion of modularity rather than the real deal.

It is true that a global namespace for values (be they classes or functions modules or even worse, stateful objects) is a problem. Imports are a problem because they inherently tie modules to such a global namespace.

In the Modula family, an import is necessary, not just an typedef/alias sort of thing as in Java. It still brings all the problems I mention, because it hardwires dependencies into modules. So import is more than an enabler - it is a problem in and of itself.

In Java, there is a global namespace freely accessible via fully qualified names, and this makes matters worse. But even FQNs were eliminated, import would still be a problem.

If this notation is unfamiliar, please brush up on your functional programming skills before you become unemployable.

ROTFL!

But Gilad, isn't this letrec non-trivial to implement if actual parameters are used as classes that are subclassed within a module? Are you proposing open-season on mutually-recursive parameterisation or are you proposing to restrict it in some way?

... This word import. I do not think it means what you think it means...

I'm in agreement here. To me, as a java programmer, import does 2 things.

1. allows a dependent piece of code to be loaded and available at run time.

this is curial to file modularity

2. allows for easy use of name spacing.

this is good in sooo many ways.

now, even if you removed #2 made all files in the default namespace there is STILL an implied:import {currentpackage};import java.lang.*;

which is needed to have classes is separate files. to avoid that you could completely ban imports if you wrote everything in one file, but then you can't use ANY of the java.** classes... including Objectso you can't even create a class.

what you are describing seems to have much more to with Dependency Injection and Interfaces than imports.

What Gilad describes is a better solution to the interfaces/dependency injection framework solution which Java programmers use to work around the problem that imports are bad.

'import' doesn't 'allow easy use of namespacing' -- it allows you to do *one* thing -- refer to a point in a global namespace.

Of course you need a way to refer to classes you depend on, but wouldn't it be nice, for instance, if you could (say) take an existing library and make it use your implementation of String (with performance characteristics which better suit your use case, perhaps) without modifying and recompiling the library from source?

"The only real need use for a global namespace is when configuring modules, and this should be done externally, and via tools."

Why is this true?

I do agree that it makes sense to isolate configuration as much as possible (and probably make it something done at runtime). What I don't see is why it has to be either external to the implemenation langauge or confined to a tool.

thanks for your comment, it started to make a lot more sense. I think that the confusion really came from the fact that I see Imports as linking and namespacing issue.

But as described you are right. I work in C# a lot and it kills me that you have to explicitly declare something virtual.

allow me to explain

Java

public void someMethod()

=

C#

public virtual void someMethod()

and

c#public void someMethod()

=Javapublic final void someMethod()

so, of course almost everyone always do the easier version.public void someMethod

and in C# that means that someone who 'wasn't think about you in the first place' needs to have 'thought about you' for you to change things later.

if they did, then you can override the method and add your hack. I believe what you are .... rediscribing ... is that it would be nice to be able to override the construction of the object as well....

which I quite agree on.

of course, this all means there shouldn't be a ban on import, there should be a ban on 'new'. after all you still want String to implement everything you think it should, and you have to describe that somewhere.

How to typecheck this scheme is an open question. Nominal type systems have a global flavor to them that undermines modularity. Perhaps structural typing is part of the answer - but I know from experience how problematic that can be as well.

Maybe I misunderstand but...Are yoy saying that you don't want a global namespace for types?How about primitive types or type VERY common? They would be treated in a different way?Is it a way to ehnance the importance of the context around the code? In some way it seems to me that improving the modularity has this some effect: everyline depends strictly on the context instead of using global (absolute) informations.

However, with optional/pluggable types, a typechecker is just one more tool, and not part of the language. In that scenario, the typechecker would have a namespace including all types it knows about, and it would interpret type annotations (which are just metadata in this case) wrt that namespace.

Hence, the language has no global namespace, but tools may do so in appropriate contexts.

As for very common names - well, we do assume that a few things are inherited from Object, such as the literal classes like String, Boolean etc., and Object itself. But you can override these. All of which sort off its with your comment on context.

one more thing; the previous comment about global namespaces for types being necessary mostly pertains to purely nominal type systems.

It may be possible to have a structural system instead - though making it usable is a challenge, I think. Most likely we end up with a mix. Now all real nominal type systems will have a bit of structural typing in them, but here we may need more.

I like your post, though it does not say something completely new. As you suggested by yourself, parametrized modules may be implemented in a programming language that is able to couple code with mutable data (OOP's object) or immutable data (closure). Even in C it is implemented, but a programmer must pass code and data as distinct arguments.

The reason everyone uses non-parametrized modules is that they feel no need of it 99%, moreover there is an additional work. If a programmer is obliged to give the library B as the argument to the library A everywhere, instead of linking A and B by default, this is a waste of time. 1% that needs to parametrize, can patch the library A, without creating inconvenience for masses.

Marketing was never my strong suit, so sure, there are doubtless other ways to make this point. I read your post, and it is certainly an example of the sort of issue that comes up. But for me, it's too wrapped up in specifics. That does make it easier to communicate to people familiar with those specifics - but as the thread that follows shows, it also gets people to focus on any number of other things (implicits, dynamic scoping etc.).