The last day or two I've been learning how to prepare a module for CPAN distribution. There doesn't seem to be a good tutorial anywhere explaining how this all works. The two tutorials I found on Perl Monks are quite old. One was written in 2002 and the other in 2005. Neither discusses newer tools like Module::Build. Nor do they explain how CPAN works, let alone discuss how this process can be safely and reliably customized.

To answer that question, one needs to know (a) what happens only on the developer's machine, (b) what happens on the target machine and (c) how CPAN itself uses the data in a distribution package.

There is no shortage of information, of course, but it all seems to be scattered here and there. I thought I'd write up my learnings and practical observations in case they would help others. Once one becomes familiar with something it is all too easy to forget what was hard.

I'd very much appreciate feedback from experienced CPAN developers. Are their practical tips I've missed? Did I form a misimpression or jump to a false conclusion about something because of my brief experience?

Many thanks in advance for feedback. -- beth

1. CPAN Distribution - Technical Overview

1.1 Contents of a CPAN package

A CPAN package is a tarball that is expected to have the following
contents.

META.json or META.yml

A JSON or YAML file describing the package. For the specification of that file, see CPAN::Meta::Spec. The JSON file is name META.json. The YAML file is named META.yml. CPAN uses the information in this file to index the package and decide how and where to display it in CPAN.

MANIFEST

This file contains a list of all of the files in the distribution tarball.

SIGNATURE

An optional signature file calculated using the list of files in the distribution's manifest file.

Build.PL and/or Makefile.PL

These are both Perl scripts that that customize build instructions to work on the target machine. Ideally both should be present.

the actual files listed in MANIFEST

The organization of the source files depends on Build.PL/make.PL. Both these scripts generate files based on some rather rigid expectations about how files are organized. For example, if Build.PL is in directory "foo", it expects all Perl source files to be in "foo/lib" and all tests to be in "foo/t".

1.2 Downloading and installing using a CPAN client

When a distribution file is downloaded from CPAN, the installation process includes seven steps:

Unpack the tarball into a directory

Generate the Build script and/or makefile
by running Build.PL or Makefile.PL. The choice depends on the version of CPAN installed on the target machine. Older versions of CPAN only know how to work with Makefile.PL. If the version knows how to use Build.PL it will use that. Otherwise it will
use Makefile.PL.

Generate source code files using scripts marked for that purpose.

Generate documentation files. Either man pages, html, or both will be generated depending on the target systems Config.pm file. The Config.pm file is part of a Perl installations configuration. On Debian it is found in /usr/lib/perl/5.N/Config.pm

Place all source code files, raw and generated, into a staging area (by convention, called "./blib/")

Run tests

Copy files to their final locations and perform custom installation actions.

Steps 1&2 are handled by a CPAN client. Steps 3-6 are handled by either ./Build test or make test. The final seventh step is handled by either ./Build install or make install.

To get a sense of how to install a CPAN tarball without benefit of a CPAN client, see perlmodinstall.

1.3 Build.PL vs. Makefile.PL

Build.PLgenerates a Perl script named "Build" but only works with newer versions of the CPAN client. Makefile.PL generates a make file that can be used even with older versions of the client. However, it is less portable because it assumes that make (or some related tool like nmake/dmake) is installed.

To complete the installation using the makefile generated by Makefile.PL, CPAN runs the commands make test and make install. This means, of course, that the installation process will fail if the new machine doesn't have make installed. This is one of the reason why newer versions of CPAN use Build.PL if available. Since it is a Perl script it can run on any machine where Perl is installed. No third party software is needed. Some systems, like Microsoft Windows, do not have make installed as a matter of course.

Even on systems that do have make, make's use of the command shell can cause problems. Each operating system has a different preferred implementation of the command shell: C, Korn, Bourne, Bash, Ash, to name a few. There are subtle syntax differences between these shells and it is quite possible that a make file that works well on one flavor of Linux/Unix will fail on another because it relies on a different flavor of Linux/Unix shell.

1.4 Choosing packaging tools

These files are not magic. Both the Perl Build script and the make file can contain any instructions immaginable as long as they know how to understand the commands 'test' and 'install'. Thus the Perl script generated by Build.PL must be able to called like this: ./Build test and ./Build install. The generated make file must support make test and make install.

However, handcrafting the meta files (META.json, META.yml) and writing a build script/make file generator requires a great deal of domain knowledge. Most developers therefore rely on one of four main tool kits to package up their modules:

2.1 Arranging your files

Module::Build expects that you will be developing your
code in a project directory that looks like this:

Build.PL

instructions for generating the Build command. This is a
file you write. See below for details.

script/

stores your Perl scripts, i.e. your .pl files

lib/

stores your Perl modules, i.e. your .pm files

t/

stores your test scripts, i.e. your .t files

test.pl

A script responsible for running all tests. If missing,
the tests in t/ will be
run via TAP::Harness or Test::Harness depending
on how you configure the Build.PL and Makefile.PL
files. If present, the test.pl determines how to run
the tests and in what order. Build test will run
test.pl instead of trying to run the tests in
t/ on its own.

inc/

supplemental files used by your packaging and installation
process. They will be included in the tarball, but the meta
data file will be set up so that they will be ignored by
CPAN's indexing mechanism. For a practical use of this
directory, see Module::Build::Cookbook's discussion of
how to bundle Module::Build with your package.

MANIFEST.SKIP

a set of regular expressions matching files and
directories to ignore in /lib, /script,
/t and /inc. These files will be excluded
from the tarball even though they are within the project
directory.

The directories listed above should contain only the files that
belong to your project. Module::Build doesn't have a good
way of extracting files from a single common source tree shared by
multiple projects. It assumes that all files in the lib directory
belong in your project unless you specifically exclude them via a
regular expression in the MANIFEST.SKIP file.

It is also essential that .pl files be placed in
scripts/ and not lib/. When Module::Build sees
.PL (or .pl in a case insensitive system) in the
lib/ directory, it assumes that the file is meant to generate
a module rather than be used as a script. It will run the script
and put the output of the script into a file that has the same
name as the script file, less the .PL suffix. Thus
lib/foobar.pm.PL would be expected to generate
lib/foobar.pm.

Module names

For portability reasons, each module name component should be
11 or fewer charaters. The first 8 of these must be different from
any other module on CPAN. This ensures that the module will behave
well on operating systems that have a very short file names.

The PAUSE documents recommend informative names over "cool" or
poetic names. For more information, see the following links:

If you use an alternate organization for your projects

If you have an alternate arrangement of files, for example,
storing all source code in a common tree rather than in per-project
directories, you will have to move the files into place before
beginning the build process. There are ways to automate this proces,
but it requires subclassing Module::Build and adding an
extra action, called 'makeproject' or 'import'.

2.2 Writing Build.PL

Build.PL is a file you write. At a minimum it contains
three basic instructions: (a) loading Module::Build or
a subclass (b) initializing a new builder object with project
specific property values and (c) generating a Perl script named
"Build".

put use VERSION at the top of your Build.PL file.
Although the constructor for Module::Build allows one
to specify a required version of Perl, older versions of the
CPAN client don't know how to read this and may try to test
packages not designed for them.

When you specify the version number for use VERSION,
use the old style version format M.mmmppp where mmm is a
3 digit 0 padded placeholder for the minor version and ppp is
a three digit 0 padded placeholder for the patch/development
release number. Thus 5.008008 rather than
5.8.8.

If your system only supports a specific set of operating
systems, the Build.PL script should begin with code
that dies with one of the following messages
"No support for OS" or "OS unsupported". The CPAN testing
tools know to look for this messsage and will consider the
platform not applicable for any distribution that generates
this message.

If you need threads your tests should be configured so that
those tests are skipped if threads are not installed.

Distribution version numbers

The dist_version property identifies the version number
for your distribution package. All distributions MUST have
a version number.

If you omit the dist_version property number, Perl will
try to guess the version number by looking for
a variable named $VERSION in the 'module_name' module. For
the example above, had 'dist_version' been omitted,
Module::Build would have looked for $VERSION in
'lib/Exception/Lite.pm'

The version number is an especially important parameter because
CPAN uses it to track distribution files. It consists of three
components: a major number indicating a collection of binary
compatible releases; a 3 digit minor version number indicating feature
enhancements within that binary compatible group, and a 3 digit patch
or development release number.

If the third component is preceded by '_', CPAN counts the upload as a development release. The intended features for the minor version may be partially implemented as well. Thus '0.099_001' would be the first development release for feature set '0.999'. It is meant to be available for testing but not as a published download.

This intention is enforced softly. The CPAN distribution page marks it with a label in big red letters saying "DEVELOPEMENT RELEASE". CPAN clients are encourged not to install it as the default version even if its version number is higher than any others. They should be downloaded only if the user requests that specific version, presumably for testing purposes.

If the patch number is preceded by a '.' then it will be published
and available for downloading via CPAN. For more information, see
Perl::Version.

No two uploads may have exactly the same version number. If you
mess up and need to reupload a distribution file, you must change
the patch or development release number.

Configuring documentation generation

Unfortunately, there don't seem to be many options to control this
process. For HTML generation there is only one user definable option:
html_css:
my $oBuilder = Module::Build->new (....);
$oBuilder->html_css('MyLayout.css');

Another related issue concerns the content of pod files. The
syntax and handling of the L<link_descriptor> has changed
over time. Two changes in particular may cause problems:

Links without text fields: Some older generators assumed that any non url style link without explicitly specified text were man pages. Instead of rendering the link text literally, they would substitute L<foo> with "the foo man page" or the foo documentation". As tedious as it may be, if your distribution is meant to work on older Perl installations, you may prefer to explicitly provide text for each link. In otherwards, your pod should use L<Module::Foo|Module::Foo>
rather than just plain L<Module::Foo>.

URL style links cannot have link text prior to Perl 12.0. You cannot do L<foo|html://example.com/foo.html> but rather must do
L<html://example.com/foo.html> without the link text.

Output of Build.PL

The script generated by this simple file contains a number of
default commands. In addition to the test and install
commands, there are several that are generally used only by developers
preparing their code for packaging.

Advanced Build.PL files

You can also have much more elaborate scripts for generating
Build.PL. This one subclasses Module::Build on the
fly and adds a routine that imports project files from a single
codebase source tree. The routine is very simple and would benefit
from many improvements (portable path name construction, checking
for deleted files, validating the copy). It is meant only for
illustration purposes:

Building a subclass with Module::Build->subclass(code=>...)
is only practical for very short snippets of code. Code defined via
the code property is compiled without benefit of strict
or warnings so it is especially easy for variable name
mispellings to slip through. Also syntax highlighting doesn't
necessarily work in here documents (on Xemacs it all gets colored as
a string) so the probability of mistakes is increased even further.

If you do choose to use Module::Build->subclass(code=>...),
everything you plan to use must be placed within the here document
assigned to the code property. The Build.PL file and
code that is part of it is never used after Build.PL runs.
In fact the code snippet that you define in the here document is
simply used to generate a subclass definition file that is placed in
the _build directory. Anything outside of that snippet will
never make it into the generated subclass file. That is why you
cannot do something like this in your Build.PL file:

If you need to define extensive amounts of code you are better off
defining your specialist code in a dedicated subclass file and
placing that file in the inc directory of your project
directory. See Module::Build::Authoring for more
information.

2.3 Running the Build.PL Command

As a developer there are two reasons you will want to run the
Build.PL command. First, the generated Build file
defines many commands that are useful to developers. Second, you
will want to test your installation process and generating
Build from Build.PL is part of that installation
process.

To generate Build you simply type perl Build.PL in
the top level of the project directory.

The Build.PL command must be run from the top level of the
project directory. The script generation routines in
Module::Build simply assumes that "lib/", "inc/", etc are
in the current directory where the script was launched. It will
complain about not being able to find modules if run from any other
directory.

Generating both a build script and makefile

If you want to generate both the build script and the makefile
your Build.PL file can set the create_makefile_pl
property in the parameter list to Module::Build->new(...).

Setting this parameter is the easiest way to generate a makefile
and it will work for most simple installations. However, if your
installation process is complex, you may need to take more control
over this process. For details, see Module::Build::Compat and
Module::Build::API's documentation on the create_makefile_pl parameter.

Deleting the generated script and starting over

Running Build.PL adds two items to the top level of the
project directory:

Build

A script defining commands for use by developers and CPAN's
automated installation process. This file will be regenerated
each time Build.PL is run.

_build

Data files used by the Build script.

You can completely remove the Build script and the
_build directory, by running the command
./Build realclean. The name of this action is a bit of a
misnomer. It always removes the build script and the _build/
directory. It sometimes removes the blib/ directory, the
distribution staging area, and temporary files produced during the html
generation process. What determines when things are removed and when they are not is not at all clear.

It appears to never remove the following files:

META.yml

MANIFEST

MANIFEST.SKIP

Makefile.PL

tarballs generated by the dist action

If you want to regenerate these from scratch, you must manually
remove them.

2.4 Packaging up your module for distribution

To package your module you must run the following commands in
sequence:

The build script generated by Build.PL does not accept more
than one action at a time so you can't combine the commands into
one single action, such as "./Build manifest disttest dist". Only the
first command will be run.

manifest

generates the MANIFEST file and creates a MANIFEST.SKIP file if
that is missing. If the MANIFEST files exists already, it will
update it.

Please note, if you decide that certain files are
no longer needed by your project and you remove them from the
project directory, the manifest action will not
remove them from the manifest file. It will merely warn you
about the missing files. You must delete them from the manifest
file manually. Alternatively, you can manually delete the file
and regenerate MANIFEST from scratch. Also note,
the realclean action does not remove the MANIFEST
or MANIFEST.SKIP files. If you want to regenerate them
from scratch you must remove them manually.

disttest

collects all the files that will be placed in the tarball into
a staging directory. If there is no META.yml file, it
will generate it and copy it to the staging area. Then it
verifies that Build.PL can be run, followed by
Build test. It does not install anything.

This method will complain if it can't find
a MANIFEST file so you must run the "manifest" action
before running this action. It will not run it automatically
for you.

The staging directory name is just the module name with
each :: replaced by '-' and '-version' tacked onto the end.
Thus Exception::Lite gets a distribution directory named
"Exception-Lite-0.099_001".

dist

converts the staging area directory into a tarball. If the
sign property is set when calling ModuleBuild->new and
your system has Module::Signature installed, the
tarball will also be signed and the results stored in a file
called SIGNATURE.

During the creation process, the directory will be removed
and in its place you will see a tarball. Thus the directory
Exception-Lite-0.099_001 is replaced by the tarball
Exception-Lite-0.999_001.tar.gz

2.5 Additional testing options

The disttest routine only verifies that the module has the
files needed to upload the module to CPAN, download it and run its
tests. To make sure your module installs properly you will need to
run additional tests. Additional testing may also be required
to make sure that the released code fits your quality control
standards.

Emulating what happens after the tarball is unpacked

To emulate what happens after the tarball is unpacked, you can
run the following sets of commands:

The first set of commands builds blib/ as normal, tests
the files and generates documentation as normal. However, instead
of copying the files to their final destination it merely reports
on what files it would have copied and to which locations.

The second set of commands does an actual fake installation to a
directory other than the normal site directory. In this case the
files are installed to /tmp/foo. You can verify this by running
./Build fakeinstall. Instead of the normal site locations, the
copy destinations will all be in /tmp/foo/.

Please note that the second method requires rebuilding the
Build script. The destination directory is hard coded into the
script and there is no option for changing the destination
directory on the build script itself.

To clean out generated files and start all over you can use. In
theory this should clean out the blib/ directory generated
by the 'test' action. It is best to double check that the file was
in fact removed. For some reason, from time to time, the "blib/"
directory won't go away even when this command is run.

Testing installation on systems other than your own.

There is very limited support for this. If you want to test
the generatio of documentation that would not normally be generated
on your system you can use the following two commands:

html

Generates the html documentation from pod files printing out
error messages about unresolvable links and other difficulties.
This is guarenteed to generate html even if the current system
does not normally request it.

Note: the 'html' action complains about being unable to
resolve links to documentation pages and modules that only have
a top level name (example: the documentation pages for
UNIVERSAL, Exception generate exceptions even
though these can be found on CPAN and have man pages visible
via the perldoc command.

manpages

Generates the man page documentation from pod files even if
the current system is not configured to request it. This
action is available version 0.28 and up only - earlier versions
relied on the fact that nearly all systems were configured to
request man pages.

You can control the locations where files will be installed by
using the --install_path and --installdirs options.
See Module::Build for details.

However, this only begins to touch on the portability issues that
can affect a module. By far and away the best option is to get your
module working well on your own system and then upload it to CPAN
where users of other systems can download and test it. See CPAN Author Notes for more information.

Quality control testing

Module::Build's generated Build script also contains
several tools for checking the quality of code, tests, and
documentation. Among them:

skipcheck

prints out the files that were omitted based on rules in
MANIFEST.SKIP. You can use this to eyeball the list
of excluded files and verify that nothing was unintentionally
excluded from your distribution by a malformed regex.

testpod

finds all of the pod files and makes sure that they are well
formed.

testcover

runs the test action using Devel::Cover and generates a
code coverage report. To use this, Devel::Cover must be
downloaded from CPAN. It isn't part of the Perl core.

Module::Build was designed for subclassing and fortunately
many developers have taken advantage of that and shared their work.

A number of extensions to Module::Build have been created
to handle special application types: applications with embedded C/C++, applications with databases, applications with a web front
end and so on. For a list of available modules, search CPAN.

3. Uploading your package to CPAN

To upload a module to CPAN, you need an account on
PAUSE. For more information, see
About Pause

4. Alternative distribution channels

The Build script generated by Module::Build also
supports packaging for software distribution channels other than
CPAN:

PPM is the package management system for Active state Perl.
For tools to create packages distributed via PPM, see the
'ppmdist' and 'ppd' actions.

Support for distributing modules as.deb packages for Debian Linux is available through Module::Build::Debian.

Updates:

2010-12-29, 7:11am IST: moved section on extensions to Module::Build into the section on building modules with Module::Build - I plan to add a top level section on Dist::Zilla recommended by several below so this doesn't make sense as a top level section.

2010-12-29, 12:30pm IST: added subsection numbers to section 1; replace "the CPAN client" with "a CPAN client" (there is more than one); removed the word "inherently" from the phrase "inherently less portable" in section 1.3 (Build.PL vs. Makefile.PM); Fixed wording in section on packaging tools (1.4) and added mention of Module::Install and Dist::Zilla.

2010-12-29, 3:00pm IST: updating discussion of development releases to include moritz's comments on development release below.

2010-12-29, 5:15pm IST: added links to perlmodinstall in 1.3 (what a CPAN client does), as an example of installing modules without the benefit of a client and another link to a document that explains the Qwalitee metrics mentioned in the section on additional testing.