1. Introduction

We've already introduced some good
software
engineering practices in the previous lectures, but this lecture is
going to contain a concentrated discussion of them. We will
explain the motivation for their use, and show how to implement them,
while giving some “hands-on” demonstrations.

The mission of the Perl for Perl Newbies talks has been continued in our
work on The Perl Beginners' site, which
aims to be the premier web site for finding resources to learn about Perl.

2. Automated Testing

Automated testing
is a software engineering method in which one writes pieces of code, which
in turn help us ascertain that the production code itself is functioning
correctly. This section provides an introduction to automated software testing
in Perl.

2.1. Motivation for Testing

So why do we want to perform automated software testing? The
first reason is to prevent bugs. By writing tests before we write the
production code itself (so-called Test-First Development) we ascertain
that the production code behaves according to the specification given in the
tests. That way, bugs that could occur, if the code was deployed right away, or
tested only manually, would be prevented.

Another reason is to make sure that bugs and regressions are not
reintroduced in the code-base. Say we have a bug, and we write a meaningful
test that fails when the bug is still in the code, and only then fix the bug.
In that case, we can re-use the test in the future to make sure the bug is not
present in the current version of the code. If the bug re-surfaces in a certain
variation, then it will likely be caught by the test.

Finally, by writing tests we provide specifications to the code and even
some form of API documentation, as well as examples of what we want the code
to achieve. This causes less duplication than writing separate specification
documents and examples, and, furthermore, is validated to be functional.

2.2.1. Test::More

Perl ships with a module called Test::More (which is part of the
Test-Simple CPAN
distribution, which may be more up-to-date there), that allows one
to write and run tests using convenient functions. Here's an example
for a test script:

is() is a Test-More built-in that compares a received result ("have")
to an expected result ("want") for exact equivalence. There are also
ok(), which just tests for truth-hood, is_deeply() which
performs a deep comparison of nested data structures, and others.

You may also notice the # TEST comments - these are
Test::Count annotations
that allow us to keep track of the number of test assertions that we have
declared and update it.

This is in an output format called TAP - The
Test Anything Protocol. There are several TAP parsers, which analyse the
output and present a human-friendly summary. For example, we can run the test
script above using the prove command-line utility that ships with
perl 5:

2.2.2. ./Build test

Standard CPAN and CPAN-like Perl packages contain their tests as a group
of *.t under the sub-directory t/, and allow running them
by invoking the make test or ./Build test commands.

Using the CPAN package
Module-Starter
one can generate a skeleton for one’s own CPAN-like package, which
can also afterwards contain tests. Keeping your code organised in such packages,
allows one to make use of a convenient build-system
such as Module-Build.
It also allows one to package it as operating-system-wide packages, which can
be removed easily using the system's package manager. Finally, these packages
can later be uploaded to CPAN for sharing with other users and developers.

Here’s an example of testing a CPAN distribution from CPAN using
./Build test:

2.3. Types of Tests: Unit Tests, Integration Tests, System Tests

Software design methodologists distinguish between several types of
automated tests. First of all, unit tests (also see
the Wikipedia article)
test only a single "unit" of the code (say a module or a class), to
see if it behaves as expected. They generally make sure that the behaviour
of the module is sane and desirable, while not trying to see if it works
as part of the larger scheme.

On the other hand, system tests test the entire system. For example,
if we're writing code to generate a web-site, we could test that the various
pages of the resultant site contain some of the qualities that we expect.
System tests tests the system as a whole, to see if there's a bug somewhere.

Between unit tests and system tests there could be several intermediate layers
of tests, normally called integration tests .

You can write all these tests using TAP, Test::More and other testing
modules on the CPAN, but it's important to be aware of the distinction.

Smoke Tests

“Smoke tests” is a
term referring to a subset of the tests used to see if the software application
performs its very basic operation well enough to give way for further testing.
It is akin to plugging in an Electronics device and making sure it doesn't
raise smoke from mis-operation. As a result, if the entire tests suite is time
consuming, the smoke testing should take a short time to perform.

Using Perl for Testing Code in Other Programming Languages

You can use Perl to test software written in many other programming languages:

If you want to perform system tests of foreign applications, you can look at
the various way for Perl to
invoke
other command-line programs, and for its sockets and networking
capabilities.

2.4. Mocking

When testing certain parts of the application, it is sometimes desirable to
mimic the functionality of different parts, so the testing will be isolated.
For example, if we're testing a server-side script (such as a CGI script),
we may wish to provide a server-emulating object that's completely under
our control and that inputs the script with our own parameters. This is
called mocking (see
the Wikipedia article
about Mock objects), and there are several mechanisms for doing so
for Perl facilities:

3. Version Control

Version control systems
are also known as “revision control systems”, and “source control systems”.
Version control is considered part of “software configuration management” (SCM)
and there are also some more comprehensive SCM systems.
Version control programs allow one to maintain various historical versions of
one's data, retrieve earlier versions, and do other operations like
branching or tagging.

This section will give the motivation for why you should start using
version control for your software development, and will give a short
demonstration using the Mercurial version control system. Feel free to skip
this section if you're already drinking the version control kool-aid.

3.1. Motivation for Version Control

Using version control gives several important advantages over the alternative
of not using any version control system at all:

You won't lose your code by accident. Having a version control
system, preferably with a remote service, will mean you're going to have another
place where your code is stored. If several developers are working on the
code simultaneously, then each one of them will have a copy of the entire
code (or, in some cases, even the entire history).

It allows you to keep historical versions of the code, for easy
reverting, comparison and investigation.

Let's say you introduced a bug. With a version control system you can easily
revert to a previous version of the code where the bug was not present
to verify that it did not exist there. Then you can diff the results, or
even bisect the history to find the exact check-in that introduced this bug.

It allows one to maintain several simultaneous lines of code (normally
called "branches") and to easily compare between them and merge them.

Finally, you'll find using a modern and high-quality version control system
a more convenient and more robust solution than using archives (such
as .zip
files) and patches.
There are plenty of open-source and gratis version control systems, some
of which are highly mature and esteemed and you shouldn't have a problem finding
something that suits you.

3.2. Demo of Mercurial

Please note: by choosing Mercurial I do
not mean to imply that it is the best VCS out there or that you should
necessarily use it. By all means, it is likely that there are other VCSes
which are better in many respects. However, I'm familiar with Mercurial, and I
think it is suitable for the demonstration here.

If you're interested in choosing a version control system, you can refer to
these resources:

4. Class Accessors

Object accessors are a way to abstract access to an object's member variables
(also known as “properties”, “attributes”, “fields”, “slots”, etc.) behind
method
calls. For example we can use $person->age() to get the age
of $person and $person->age(21) or
$person->set_age(21) to set their age to 21.

Accessors provide several important advantages over accessing the
properties of objects directly and this section will serve as an introduction
to them.

4.2. Motivation

So why should we use accessors instead of doing a direct
$person->{'age'} access to the object's property? There
are several reasons for that:

Writing the property names directly each time is prone to mis-spellings
and errors, because they are strings. On the other hand, with method calls,
the existence of a particular one is validated at run-time, and will
throw an exception if a method was misspelled into a name that is not present.

If a property needs to be converted from a first-order property to a
calculated value, then one can still use the existing method-based interface
to access it, just by changing the implementation of the methods. On the
other, this is much more difficult to change with a direct-field access.

The external interface provided by methods is cleaner and easier to
maintain compatibility with, than a direct class access.

There may be other reasons, like better concurrency, persistence, etc.

4.3. Accessor modules on the CPAN

As you may have noticed from our example, writing accessors by hand
involves a lot of duplicate code, and can get tedious. One way to overcome
it is by using namespace games
(e.g: *Person::${field} = sub { .... }), but there are many modules
on CPAN that do it all for you. Here's an overview of some of the most
prominent ones:

Class-Accessor
was one of the earliest accessor providing modules and is still pretty popular.
It is pure Perl, has no dependencies, and works pretty well. It has many
enhancements on CPAN that may work better for you.

Class-XSAccessor is an accessor generator partially written using C and Perl/XS
which is the Perl external subroutine mechanism. As such, it provides an
unparalleled speed among the other accessor generators, and is even faster than
writing your own accessor methods by hand, like we did in the example.

While Moose provides accessors, they are only the tip of its
iceberg. Moose is in fact a “post-modern” object system for Perl 5 that
provides a type system, delegators, meta-classes, wrapping routines, and many
other advanced features. As I once
said:

If you're looking to take your object oriented programming in Perl 5
to new levels - look no further than that. One should be warned that as of
this writing (August, 2009), Moose may have a relatively long startup time,
although the situation has been improved and is expected to improve further.

5.2. Lexical Filehandles

Traditionally Perl filehandles had been "typeglobs" - global names - normally
starting with an uppercase letter that were not scope safe. While they could
have been localised using "local", this was still a far cry from true
lexical scoping. perl-5.6.x, however,
introduced
lexical filehandles for both file handles and directory handles.

IO::Handle and Friends

Perl provides a set of lexical and object-oriented abstractions for file
handles called IO::Handle. Starting from recent versions of Perl, one can
use them with the built-in perlfunc mechanisms. You can find more information
about them here:

6. The local keyword

Before Perl 5 came out and Perl got lexical scoping and the my
keyword, an older local keyword was made available for programmers to
temporarily "localise" the values of variables (or parts there of) in Perl.

As opposed to my, which is lexically scoped, local is
dynamically scoped. What happens when one writes a
local $myvar = NEW_VALUE_EXPR(); (which will work only for package
variables) is that perl will store the previous value of the variable somewhere
safe, allow the programmer to tamper with it as it pleases, and restore its
value to its previous, saved state, when the block exits. As opposed to
my, the new localised value will survive function calls in different functions.

6.1. Use and Abuse

The rule of the thumb is that for general scoping, local should not
be used instead of my, which is safer and better. You may still
encounter some code using local in the wild, but assuming you need to maintain
it, this code should be revamped to use my instead.

7. Using POD for Documentation

POD is short for
"Plain Old Documentation", and is a lightweight markup language, which is
the de-facto standard for writing documentation for Perl programs, Perl
modules and Perl itself.

In the context of Perl modules, POD is primarily used to give API
(Application Programmers' Interface) documentation. In the context of
Perl programs, POD is primarily used to document the usage of the program
and the command line flags it accepts. POD is also used to document the
perl core (so-called perldocs).

7.1. POD Demonstration

How to write POD

POD sections start with a single POD directive on a new line and continue
up to the next =cut directive also on a line of its own. Here are
some POD directives:

Headers

=head1, =head2, =head3, etc. - these are headers.
The lower the header number is, the more significant it is and the bigger
font will be used for it. Headers are followed by the text of the header.
For example:

=head1 All you wanted to know about animals.=head2 Introduction
This document aims to explain about animals.
=head2 Mammals.=head3 Cats
Cats are awesome. They are useful for keeping the rats' population at
bay.
=head3 Dogs
Dogs have been called Man's best friend.

Regular Text

As you can see, a regular paragraph text is a paragraph. Paragraphs are separated by
blank lines, and newlines are ignored.

Code Blocks

A code block (or verbatim paragraph) can be added by creating a portion
of the text that's indented by using whitespace. In code blocks, newlines are
not ignored. For example:

=head1 All you wanted to know about animals.=head2 Introduction
This document aims to explain about animals.
=head2 Mammals.=head3 Cats
Cats are awesome. They are useful for keeping the rats' population at
bay.
=head3 Dogs
Dogs have been called Man's best friend.
Here is an example program to name your dog:
#!/usr/bin/perl use strict; use warnings; my @dog_names = (qw(Rex George Beethoven Max Rocky Lucky Cody)); print "Name your dog " . $dog_names[rand(@dog_names)] . "!\n";
Put it in a file and run it.

One should note that one can combine several styles at once using
BI< ... > notation. Furthermore, one can enclose text with
special characters (such as < and >) using
several <<< and trailing >>> characters.

Lists

One can use lists in POD by writing =over 4 (or some other value
of indent-level instead of "4"), and then several =item's and
finally =back. An item can be =item * for a bullet,
=item 1. to produce numbered lists or =item title to
produce a definition list.

For example:

=head1 All you wanted to know about animals.=head2 Introduction
This document aims to explain about animals.
=head2 Mammals.=head3 Cats
Cats are awesome. They are useful for keeping the rats' population at
bay.
=head3 Dogs
Dogs have been called Man's best friend.
Here is an example program to name your dog:
#!/usr/bin/perl use strict; use warnings; my @dog_names = (qw(Rex George Beethoven Max Rocky Lucky Cody)); print "Name your dog " . $dog_names[rand(@dog_names)] . "!\n";
Put it in a file and run it. This program will generate one of the following
names:
=over4=item * Rex
Rex like the dinosaur.
=item * George
Like George Washington.
=item * Beethoven
Last name of the famous composer.
=item * Max
Short for Maximilian.
=item * Rocky
Like the film.
=item * Lucky
A lucky dog.
=item * Cody
For good coding.
=back

7.3. Literate Programming

Literate
Programming is a method of writing code that allows one to
intermingle code with documentation, re-order the sections of the code
in relevance to their intention, and create an entire document typeset
that is explaining the code, with full cross-references and interlinks.
As Mark Jason Dominus explains
POD is not
Literate Programming.

Traditionally, Literate Programming systems have generated
TeX/LaTeX output,
but more recently there have been ones that could output
DocBook/XML.

I am personally not writing my code in a Literate Programming style, because
I feel that:

It will require much more effort to create code that will only be marginally
easier to understand.

The documentation will need to be maintained along with the code and may
become out-of-date. Even inline comments suffer from this symptom, and
external documentation much more so.

The code should be structured to be as self-documenting as possible.
For example, instead of documenting what a block of code is doing, one
should extract a subroutine with a name that conveys the intention.

However, I'm mentioning Literate Programming here for completeness sake,
should you choose to follow this route.

7.4. POD Extensions

PseudoPod is an extended set of Pod tags used for book manuscripts. Standard
Pod doesn't have all the markup options you need to mark up files for
publishing production. PseudoPod adds a few extra tags for footnotes, tables,
sidebars, etc.

8. Module-Build and Module-Starter

Now let's tie everything together. When you download a Perl package from
CPAN, there's a standard way to build and install it -
perl Makefile.PL, make, make test and
make install (or alternatively a similar process with
perl Build.PL and ./Build ).

When creating packages of Perl code, it is preferable to
make them capable of being built this way, even if they are intended for
internal use. That is because packaging them this way gives you many
advantages, among them the ability to specify CPAN (and in-house)
dependencies, integrity tests, configurability in building and installation,
and simplification of the preparation of system packages (such as
.rpms or .debs).

In this section we'll learn how to prepare your own CPAN-like package of
Perl 5 code using
module-starter and
Module-Build.
There are some variations on this theme, but it should get you started.

What the perl Build.PL command does is generate the Build
script in the current directory that can be used to perform such operations
as building, testing, packaging, and installing of the distribution. Sometimes
we need to re-run perl Build.PL if we modified the configuration.

After we had ran ./Build, we ran ./Build test to run the
automated tests that Module-Starter generated for us. As you can see
the line says that all tests successful. If they were not, we should fix
either the code or the tests, depending on what is wrong.

8.3. Adding meaningful code

If we look at the code of the lib/…*.pm file, we'll see that there's
practically nothing there. So now it's time that we add some meaningful
code to the modules. But first we need to add some tests. Let's add this
test script under t/add.t

Since all tests are successful, we can commit the changes to the repository.

Moving on

Now we can continue to add more tests, and then fix the failing ones. If the
code becomes too convoluted, due to modifications, we can
refactor it and
improve its modularity. Running the existing automated tests after such a
change will better make sure that we didn't break something.

This "write more tests", "get tests to pass", "refactor" is the cycle of
development and maintenance, and Perl tools such as Module-Build
facilitate it.

8.4. Getting rid of the boilerplate

The skeleton of the distribution generated by Module-Starter contains some
boilerplate, which is pre-included text and code, used as placeholders. That
should be replaced by more meaningful one by the programmer who is writing
the distribution.

Luckily, it also generates a script on t/boilerplate.t that checks
for that boilerplate and reports it. However, the tests there are marked as
TODO tests, whose failure status is ignored by default. To turn off their
TODO status, open t/boilerplate.t in your text editor and remove
or comment-out the following line

local $TODO = "Need to replace the boilerplate text";

After we do that, we get some test failures when running ./Build test:

8.5. Additional Resources

Here are some additional resources regarding managing a CPAN-like distribution.

ExtUtils-MakeMaker
is Perl's older and now largely unloved distribution manager, which relies on
generating
makefiles. It was
described
by chromatic as a jumble of Perl which
writes cross platform shell scripts to install Perl code, and you customize
that by writing a superclass from which platform-specific modules inherit
pseudo-methods which use regular expressions to search and replace
cross-platform cross-shell code, with all of the cross-platform and
cross-shell quoting issues that entails .

Module-Install is
a more modern and succinct wrapper around ExtUtils-MakeMaker that has gained
some popularity. It ships its code (and the code of its extensions) under an
./inc directory in the distribution, which has known to cause some
bootstrapping issues for co-developers who would like to collaborate on the
code from its version control repository. Nevertheless, it may be worth taking a
look.

Writing
Perl Modules for CPAN is a book by Sam Tregar, which has a free
PDF download. It is somewhat out-of-date (only covering ExtUtils-MakeMaker),
but may still be enlightening.

Dist::Zilla is a high-level distribution
generator, with many available plugins, that abstracts away a lot of the
duplication within a module and across modules. It generates fully-functional
distributions that can be shipped to CPAN and used normally. As with
Module-Install, it may pose a problem to your contributors, especially if they
have out-of-date versions of its CPAN modules installed, but it is a useful
tool.

9. Conclusion

The aim of this presentation was to make your Perl code (and that of other
programming languages) less error-prone, easier to understand, and easier to
modify. I did not provide a complete coverage of code external quality (which
is what the user feels or notices) or internal quality (which is what
is also affecting the developers maintaining the code). For a more thorough
coverage of those, you are referred to: