Namespace packages are a mechanism for splitting a single Python package
across multiple directories on disk. In current Python versions, an algorithm
to compute the packages __path__ must be formulated. With the enhancement
proposed here, the import machinery itself will construct the list of
directories that make up the package. This PEP builds upon previous work,
documented in PEP 382 and PEP 402. Those PEPs have since been rejected in
favor of this one. An implementation of this PEP is at [1].

in the package's __init__.py. Every distribution needs to provide
the same contents in its __init__.py, so that extend_path is
invoked independent of which portion of the package gets imported
first. As a consequence, the package's __init__.py cannot
practically define any names as it depends on the order of the package
fragments on sys.path to determine which portion is imported
first. As a special feature, extend_path reads files named
<packagename>.pkg which allows declaration of additional portions.

setuptools provides a similar function named
pkg_resources.declare_namespace that is used in the form:

import pkg_resources
pkg_resources.declare_namespace(__name__)

In the portion's __init__.py, no assignment to __path__ is
necessary, as declare_namespace modifies the package __path__
through sys.modules. As a special feature, declare_namespace
also supports zip files, and registers the package name internally so
that future additions to sys.path by setuptools can properly add
additional portions to each package.

Namespace packages are designed to support being split across multiple
directories (and hence found via multiple sys.path entries). In
this configuration, it doesn't matter if multiple portions all provide
an __init__.py file, so long as each portion correctly initializes
the namespace package. However, Linux distribution vendors (amongst
others) prefer to combine the separate portions and install them all
into the same file system directory. This creates a potential for
conflict, as the portions are now attempting to provide the same
file on the target system - something that is not allowed by many
package managers. Allowing implicit namespace packages means that the
requirement to provide an __init__.py file can be dropped
completely, and affected portions can be installed into a common
directory or split across multiple directories as distributions see
fit.

A namespace package will not be constrained by a fixed __path__,
computed from the parent path at namespace package creation time.
Consider the standard library encodings package:

Suppose that encodings becomes a namespace package.

It sometimes gets imported during interpreter startup to
initialize the standard io streams.

An application modifies sys.path after startup and wants to
contribute additional encodings from new path entries.

An attempt is made to import an encoding from an encodings
portion that is found on a path entry added in step 3.

If the import system was restricted to only finding portions along the
value of sys.path that existed at the time the encodings
namespace package was created, the additional paths added in step 3
would never be searched for the additional portions imported in step
4. In addition, if step 2 were sometimes skipped (due to some runtime
flag or other condition), then the path items added in step 3 would
indeed be used the first time a portion was imported. Thus this PEP
requires that the list of path entries be dynamically computed when
each portion is loaded. It is expected that the import machinery will
do this efficiently by caching __path__ values and only refreshing
them when it detects that the parent path has changed. In the case of
a top-level package like encodings, this parent path would be
sys.path.

Regular packages will continue to have an __init__.py and will
reside in a single directory.

Namespace packages cannot contain an __init__.py. As a
consequence, pkgutil.extend_path and
pkg_resources.declare_namespace become obsolete for purposes of
namespace package creation. There will be no marker file or directory
for specifying a namespace package.

During import processing, the import machinery will continue to
iterate over each directory in the parent path as it does in Python
3.2. While looking for a module or package named "foo", for each
directory in the parent path:

If <directory>/foo/__init__.py is found, a regular package is
imported and returned.

If not, but <directory>/foo.{py,pyc,so,pyd} is found, a module
is imported and returned. The exact list of extension varies by
platform and whether the -O flag is specified. The list here is
representative.

If not, but <directory>/foo is found and is a directory, it is
recorded and the scan continues with the next directory in the
parent path.

Otherwise the scan continues with the next directory in the parent
path.

If the scan completes without returning a module or package, and at
least one directory was recorded, then a namespace package is created.
The new namespace package:

Has a __path__ attribute set to an iterable of the path strings
that were found and recorded during the scan.

Does not have a __file__ attribute.

Note that if "import foo" is executed and "foo" is found as a
namespace package (using the above rules), then "foo" is immediately
created as a package. The creation of the namespace package is not
deferred until a sub-level import occurs.

A namespace package is not fundamentally different from a regular
package. It is just a different way of creating packages. Once a
namespace package is created, there is no functional difference
between it and a regular package.

The import machinery will behave as if a namespace package's
__path__ is recomputed before each portion is loaded.

For performance reasons, it is expected that this will be achieved by
detecting that the parent path has changed. If no change has taken
place, then no __path__ recomputation is required. The
implementation must ensure that changes to the contents of the parent
path are detected, as well as detecting the replacement of the parent
path with a new path entry list object.

PEP 302 defines "finders" that are called to search path elements.
These finders' find_module methods return either a "loader" object
or None.

For a finder to contribute to namespace packages, it must implement a
new find_loader(fullname) method. fullname has the same
meaning as for find_module. find_loader always returns a
2-tuple of (loader, <iterable-of-path-entries>). loader may
be None, in which case <iterable-of-path-entries> (which may
be empty) is added to the list of recorded path entries and path
searching continues. If loader is not None, it is immediately
used to load a module or regular package.

Even if loader is returned and is not None,
<iterable-of-path-entries> must still contain the path entries for
the package. This allows code such as pkgutil.extend_path() to
compute path entries for packages that it does not load.

Note that multiple path entries per finder are allowed. This is to
support the case where a finder discovers multiple namespace portions
for a given fullname. Many finders will support only a single
namespace package portion per find_loader call, in which case this
iterable will contain only a single string.

The import machinery will call find_loader if it exists, else fall
back to find_module. Legacy finders which implement
find_module but not find_loader will be unable to contribute
portions to a namespace package.

The specification expands PEP 302 loaders to include an optional method called
module_repr() which if present, is used to generate module object reprs.
See the section below for further details.

It is possible, and this PEP explicitly allows, that parts of the
standard library be implemented as namespace packages. When and if
any standard library packages become namespace packages is outside the
scope of this PEP.

As described above, prior to this PEP pkgutil.extend_path() was
used by legacy portions to create namespace packages. Because it is
likely not practical for all existing portions of a namespace package
to be migrated to this PEP at once, extend_path() will be modified
to also recognize PEP 420 namespace packages. This will allow some
portions of a namespace to be legacy portions while others are
migrated to PEP 420. These hybrid namespace packages will not have
the dynamic path computation that normal namespace packages have,
since extend_path() never provided this functionality in the past.

Multiple portions of a namespace package can be installed into the
same directory, or into separate directories. For this section,
suppose there are two portions which define "foo.bar" and "foo.baz".
"foo" itself is a namespace package.

If these are installed in the same location, a single directory "foo"
would be in a directory that is on sys.path. Inside "foo" would
be two directories, "bar" and "baz". If "foo.bar" is removed (perhaps
by an OS package manager), care must be taken not to remove the
"foo/baz" or "foo" directories. Note that in this case "foo" will be
a namespace package (because it lacks an __init__.py), even though
all of its portions are in the same directory.

Note that "foo.bar" and "foo.baz" can be installed into the same "foo"
directory because they will not have any files in common.

If the portions are installed in different locations, two different
"foo" directories would be in directories that are on sys.path.
"foo/bar" would be in one of these sys.path entries, and "foo/baz"
would be in the other. Upon removal of "foo.bar", the "foo/bar" and
corresponding "foo" directories can be completely removed. But
"foo/baz" and its corresponding "foo" directory cannot be removed.

It is also possible to have the "foo.bar" portion installed in a
directory on sys.path, and have the "foo.baz" portion provided in
a zip file, also on sys.path.

We add project1 and project2 to sys.path, then import
parent.child.one and parent.child.two. Then we add the
project3 to sys.path and when parent.child.three is
imported, project3/parent is automatically added to
parent.__path__:

At PyCon 2012, we had a discussion about namespace packages at which
PEP 382 and PEP 402 were rejected, to be replaced by this PEP [3].

There is no intention to remove support of regular packages. If a
developer knows that her package will never be a portion of a
namespace package, then there is a performance advantage to it being a
regular package (with an __init__.py). Creation and loading of a
regular package can take place immediately when it is located along
the path. With namespace packages, all entries in the path must be
scanned before the package is created.

Note that an ImportWarning will no longer be raised for a directory
lacking an __init__.py file. Such a directory will now be
imported as a namespace package, whereas in prior Python versions an
ImportWarning would be raised.

Nick Coghlan presented a list of his objections to this proposal [4].
They are:

The inclusion of namespace packages in the standard library was
motivated by Martin v. Löwis, who wanted the encodings package to
become a namespace package [6]. While this PEP allows for standard
library packages to become namespaces, it defers a decision on
encodings.

An early draft of this PEP specified a change to the find_module
method in order to support namespace packages. It would be modified
to return a string in the case where a namespace package portion was
discovered.

However, this caused a problem with existing code outside of the
standard library which calls find_module. Because this code would
not be upgraded in concert with changes required by this PEP, it would
fail when it would receive unexpected return values from
find_module. Because of this incompatibility, this PEP now
specifies that finders that want to provide namespace portions must
implement the find_loader method, described above.

The use case for supporting multiple portions per find_loader call
is given in [7].

Guido raised a concern that automatic dynamic path computation was an
unnecessary feature [8]. Later in that thread, PJ Eby and Nick
Coghlan presented arguments as to why dynamic computation would
minimize surprise to Python users. The conclusion of that discussion
has been included in this PEP's Rationale section.

An earlier version of this PEP required that dynamic path computation
could only take affect if the parent path object were modified
in-place. That is, this would work:

sys.path.append('new-dir')

But this would not:

sys.path = sys.path + ['new-dir']

In the same thread [8], it was pointed out that this restriction is
not required. If the parent path is looked up by name instead of by
holding a reference to it, then there is no restriction on how the
parent path is modified or replaced. For a top-level namespace
package, the lookup would be the module named "sys" then its
attribute "path". For a namespace package nested inside a package
foo, the lookup would be for the module named "foo" then its
attribute "__path__".

Previously, module reprs were hard coded based on assumptions about a module's
__file__ attribute. If this attribute existed and was a string, it was
assumed to be a file system path, and the module object's repr would include
this in its value. The only exception was that PEP 302 reserved missing
__file__ attributes to built-in modules, and in CPython, this assumption
was baked into the module object's implementation. Because of this
restriction, some modules contained contrived __file__ values that did not
reflect file system paths, and which could cause unexpected problems later
(e.g. os.path.join() on a non-path __file__ would return gibberish).

This PEP relaxes this constraint, and leaves the setting of __file__ to
the purview of the loader producing the module. Loaders may opt to leave
__file__ unset if no file system path is appropriate. Loaders may also
set additional reserved attributes on the module if useful. This means that
the definitive way to determine the origin of a module is to check its
__loader__ attribute.

For example, namespace packages as described in this PEP will have no
__file__ attribute because no corresponding file exists. In order to
provide flexibility and descriptiveness in the reprs of such modules, a new
optional protocol is added to PEP 302 loaders. Loaders can implement a
module_repr() method which takes a single argument, the module object.
This method should return the string to be used verbatim as the repr of the
module. The rules for producing a module repr are now standardized as:

If the module has an __loader__ and that loader has a module_repr()
method, call it with a single argument, which is the module object. The
value returned is used as the module's repr.

If an exception occurs in module_repr(), the exception is
caught and discarded, and the calculation of the module's repr
continues as if module_repr() did not exist.

If the module has an __file__ attribute, this is used as part of the
module's repr.

If the module has no __file__ but does have an __loader__, then the
loader's repr is used as part of the module's repr.

Otherwise, just use the module's __name__ in the repr.

Here is a snippet showing how namespace module reprs are calculated
from its loader: