This PEP proposes to add a new set of import hooks that offer better customization of the Python import mechanism. Contrary to the current import hook, a new-style hook can be injected into the existing scheme, allowing for a finer grained control of how modules are found and how they are loaded.

We are considering introducing a feature similar to PEP302 into Ruby 2.0 (CRuby 2.0). I want to make a proposal which can persuade Matz. Currently, CRuby can load scripts from only file systems in a standard way.

If you have any experience or consideration about PEP 302, please share.

Example:

It's a great spec. No need to change it.

It is almost good, but it has this problem...

If I could go back to 2003, then I would change the spec to...

I'm sorry if such a question is not suitable for here. I posted here because I'm not sure that I can ask this question at python-dev (of course, the list is not for cruby development).

Wow, the YARV guy himself, hello and welcome to Programmers! ;) On Stack Exchange we really don't like open ended discussion, we love solving specific problems instead (give a quick read to our FAQ) - which I'm guessing is why your question was closed on Stack Overflow and it already has a close vote here. You should give it a go at making this a bit more specific - do you have a specific concern about PEP 302 that motivated this question?
–
Yannis Rizos♦Jun 26 '12 at 6:13

4

Thank you for your comment, Yannis. I think I want to discuss about "software architecture". PEP302 seems powerful and general framework to extend their own loaders on python interpreter. However, powerful feature has risk such as overusing (generates magical codes), preventing optimization of interpreter. So I want to know this extension framework is sweet or not for python users and interpreter developers. I believe studying history will help me to make good spec on Ruby 2.0.
–
Koichi SasadaJun 26 '12 at 7:28

Thank you modifying my question pretty. And I'm sorry if this question is not preferable one.
–
Koichi SasadaJun 26 '12 at 7:31

4 Answers
4

I'm the maintainer of Python's runpy module, and one of the maintainers of the current import system. While our import system is impressively flexible, I'd advise against adopting it wholesale without making a few tweaks - due to backwards compatibility concerns, there are bunch of things that are more awkward than they would otherwise need to be.

One thing that hurt with PEP 302 in Python is how long it took us to convert the core import system over to using it. For the better part of a decade, anyone doing anything complex with import hooks has been stuck implementing two pieces: one handling PEP 302 compliant loaders (such as zip imports), and a second handling the standard filesystem based import mechanism. It's only in the forthcoming 3.3 that handling PEP 302 loaders will also take care of handling modules imported through the standard filesystem import mechanism. Try not to repeat that mistake if you can possibly avoid it.

PEP 420 (implemented for Python 3.3) makes some additions to the protocol to allow importers to contribute portions to namespace packages. It also fixes a naming problem in the Finder API definition (effectively replacing the misnamed "find_module" with the more accurate "find_loader"). This should hopefully all be documented more clearly in the language spec by the time 3.3rc1 rolls around in a couple of weeks time.

Another notable problem is that the approach documented specifically in PEP 302 has way too much process global state. Don't follow us down that path - try to encapsulate the state in a more coherent object model so it's slightly easier to selectively import other modules (C extension modules are the bane of making any such encapsulation completely effective, but even some level of encapsulation can be helpful).

PEP 406 (http://www.python.org/dev/peps/pep-0406/) discusses a possible backwards compatible evolution of Python's approach with improved state encapsulation. If you have an encapsulated state model from the beginning though, then you can define your APIs accordingly and avoid having importers and loaders access global state at all (instead being passed a reference to the active engine).

Another missing piece in PEP 302 is the ability to ask an importer for an iterator over the modules provided by that importer (this is necessary for things like freeze utilities and automatic documentation utilities that extract docstrings). Since it's incredibly useful, you'd probably be better off standardising it from the get go: http://docs.python.org/dev/library/pkgutil#pkgutil.iter_modules (we'll probably finally elevate this to a formally specified API in Python 3.4)

And my last comment is that you should take a close look at the division of responsibility between the import system and the loader objects. In particular, consider splitting the "load_module" API into separate "init_module" and "exec_module" steps. That should allow you to minimise the degree to which loaders need to interact directly with the import state.

PEP 302 and importlib are a great starting point for a more flexible import system, but there are definitely mistakes we made that are worth avoiding.

Next to ncoghlan I'm the other maintainer of Python's import system and the author of its current implementation, importlib (http://docs.python.org/dev/py3k/library/importlib.html). Everything Nick said I agree with, so I just want to add some extra info.

First, don't rely too heavily on PEP 302 directly but instead look at what importlib provides in terms of abstract base classes, etc. For backwards-compatibility things had to be be compatible with PEP 302, but I had to add some of my own APIs in order to finish fleshing out the support for true flexibility.

Another important point is that you are giving developers two pieces of flexibility. One is the ability to store code in a way other than just directly on the file system as individual files (I call this the storage back-end for imports), e.g. this is allowing code to live in a zip file, sqlite database, etc. The other support is in allowing control to pre- or post-process code in some way, e.g. Quixote (https://www.mems-exchange.org/software/quixote/) and its alternative use of string literals not assigned to a variable would be much easier to support.

While the latter is rarely needed, the former is where you have to worry about support. And this is where you end up practically redefining file system interaction APIs. Since some people need assets stored as files with their code, you need to provide a good way to read files, discover files, etc. We still need to implement the part of the API for discovering what data files are available, listing them, etc.

But then you also have the need for APIs which are code-specific. As Nick mentioned, you end up needing APIs on discovering what modules a package contains, etc. which are not file-specific. There is this odd duality of having APIs for dealing with modules where you have extracted away the concept of files, but then you end up needing to provide APIs to access file-like asset data. And as soon as you try to implement one in regards of the other to avoid duplication the waters get really murky (i.e. people end up relying on expected file path structuring, etc. without paying attention to the fact the path may not be a true path because it is for a zipfile containing code and not just a file). IOW you will end up having to implement two similar APIs, but you will be better off for it in the long run.

As Nick said, our solution is a good starting point, but it isn't how I would do it today if I was designing the API from scratch.