It's difficult for most of us to evaluate the multitude of web frameworks and
microframeworks available in Python, to choose the ideal framework for their
project. Naturally, proponents of each microframework will pitch in with
comments about how much simpler it is than the alternatives, how much less
boilerplate there is, or how much faster it runs at scale. Likewise, seasoned
users of each microframework will have their favoured approaches for caching,
storage, deployment, and the myriad other components needed for web
development. They appreciate having the flexibility to choose and integrate
those components to their own liking.

The numerous arguments in favour of one framework or another muddy the waters
somewhat; in most cases, starting your project with Django will be a safe
decision that you will probably not regret.

In bullet points:

Newbies should choose Django. It will keep you secure and it will teach
you a lot.

If you aren't completely sure what direction your project will develop, use
Django. You'll soon hit use cases where you'll want some of the extras that
are built into Django.

If you know what you're doing with Python web libraries, and you have a
fairly comprehensive vision of what your app is doing - maybe you know you
want high performance in particular areas, or more simplicity for a subset of
use cases, then choose your framework on the basis of that knowledge.

Do experiment with and learn the other frameworks.

Django ships with good, solid "batteries" included - libraries that fulfil all
sorts of requirements that come up time and again in web programming.
Moreover, the Django community has produced thousands of packages (5413 at the
time of writing) that fill other gaps. That's many thousands more packages than
for any other Python framework. In my projects I invariably find myself wanting
some of these components at some point down the line, when requirements arise
that we'd never forseen (For example, i18n requirements sometimes come up later
in a project's life, when you want to roll out to new regions). Certainly,
neither Django's batteries included nor the community packages will be suitable
in every use case, but they get you up and running quickly.

One argument made in favour of microframeworks is that they offers the
flexibility to choose alternative components. I don't think it's particularly
difficult to swap out components in Django for alternatives more suited to a
specific need - I've done so many times. Django is explictly coupled, unlike,
say, Ruby on Rails, so you can simply call alternative functions to use
different storages or template engines etc.

Note however, that Django's integrated-by-default components will also be
secure-by-default; any home-rolled integrations may not be. For example, Django
applications using Django>=1.2 are protected from CSRF attacks by default.
Any microframework that doesn't pre-integrate form generation and request
dispatcher components won't be able to say that. This is true whether you're
integrating things with a microframework or using non-standard components in
Django.

There are a couple of other arguments that I've heard:

"Django is slower than x" - maybe, but don't optimise prematurely.

"Django doesn't scale as well as x" - scale is a much more complicated
problem than "use this tool and you'll be alright". Approaches to scaling
Django will be comparable to approaches to scaling any other framework.

"Django isn't well-suited to client-side HTML5 apps" - this is true,
but it isn't particularly bad at them either. Also don't underestimate the
numer of additional pages and components needed to productise your core app,
even if it's a rich HTML5 app made of static assets and XHR.

I hope this unmuddies the waters a little, especially for beginners. Of course,
I'm not advocating anything other than "use the right tool for the job", but
until you're sure exactly what the job entails, it doesn't hurt to have a
comprehensive toolbox at your disposal.

"Organizations which design systems are constrained to produce designs
which are copies of the communication structures of these organizations."

In other words, the structure of a program or system is likely to
evolve to mirror the management structure of the organisation. Even with a
couple of teams working on a small project you may end up with various layers
of shims and wrappers to make code written by team A interface with team B's
preferred way of doing things.

The schism between Dev and Ops teams that is regularly cited in the DevOps
movement is another example of Conway's Law in action. The message there is
simple: get developers and operations to collaborate on common business goals
(eg. frequent, reliable deployments) or else their competing priorities, poorly
communicated, will cause friction that risks the business' ability to deliver
on any goals. The excellent The Phoenix Project describes several potential
communication gaps other than between Dev and Ops, such as between compliance
and developers, and information security and operations, and tells a parable
about how close cross-team collaboration avoids a series of potential
disasters.

There are various solutions to the problem. In the original magazine article
in which Melvin Conway introduced the idea he went on to propose a solution:

"This criterion creates problems because the need to communicate at any
time depends on the system concept in effect at that time. Because the
design which occurs first is almost never the best possible, the prevailing
system concept may need to change. Therefore, flexibility of organization
is important to effective design."

"Why does your desk have wheels? Think of those wheels as a symbolic
reminder that you should always be considering where you could move
yourself to be more valuable. But also think of those wheels as literal
wheels, because that’s what they are, and you’ll be able to actually move
your desk with them."

A slightly less radical approach is to attempt to create strong but fixed
communication pathways between teams. Spotify, for example, has described
having chapters and guilds that encourage collaboration across team
boundaries on specific issues, skills or disciplines.

You can apparently also beat Conway's Law not by improving cross-team
communication but by ensuring your teams are set up to match the architecture
of the technology products you want to produce. A leaked memo from a former
Amazon employee that contrasts Amazon's structure with Google's
mentions that Jeff Bezos mandated that:

All teams will henceforth expose their data and functionality through
service interfaces. [...] Teams must communicate with each other through
these interfaces.

Bezos is relying on Conway's Law to ensure the technology is structured well
rather than neglecting Conway's Law and letting it create an unexpected
architecture. This solution doesn't attempt to address Melvin Conway's
observation that "the design which occurs first is almost never the best
possible", but if you have an established or proven architecture, perhaps
something that offers maintainability or security benefits, you may be able to
ensure it is more closely followed by removing the flexibility to interact
across the architecture boundaries you want to draw.

The past weekend I've been writing my first programs in Rust, which seems to
be in vogue at the moment. Mentions of Rust keep coming up on
/r/programming and it's obvious why: Rust is a very exciting community
right now, with very active development and new libraries landing all the time.

The syntax itself is reminiscent of Ruby (but with braces). As a Python
programmer, I've never found Ruby that interesting a prospect. I've learned the
Ruby language enough to write Puppet providers, but Ruby as a language occupies
very much the same space as Python and I've never seen the need to take it
further.

Rust offers some of the same syntactic ideas as Ruby but on offer is a highly
performant natively-compiled language with static but inferred types. Rust's
pointer ownership model allows the compiler to automatically allocate and free
objects in most cases without needing reference counting or garbage collection
(though both of these are available too). You could perhaps describe it as a
low-level language with high-level syntax. So this is a very different
proposition to Python and Ruby, and rather different to C and C++ too.

My first Rust program was an implementation of a simple, insecure
monoalphabetic substitution ciphers (inspired, perhaps, because I've already
written a genetic algorithm in Python to crack them). I'm pleased that the
code ends up clean and easy to read. For example, a snippet that encodes a
single character with ROT13 might be

// A function that takes a char c, and returns its rot13 substitutionfnrot13_char(c:char)->char{letbase=matchc{'a'..'z'=>'a'asu8,// If c is in [a-z], set base to 97'A'..'Z'=>'A'asu8,// If c is in [A-Z], set base to 65_=>returnc// For all other characters, return c};letord=casu8-base;// ord is in the range 0-25letrot=(ord+13)%26;// rot13(rot+base)aschar// convert back to an ASCII character. Note no// semicolon - this is an implicit return}

I also spent some time working out how to call Rust code from Python, which
would allow me to use both languages to their strength in the same project. It
turns out it isn't hard to do this, by compiling a .so in Rust with the
#[no_mangle] annotation on the exported methods, and some simple ctypes
magic on the Python side. One downside is that so far I've only worked out how
to pass strings as c_char_p which is not optimal either for Rust or Python.
Sample code is on Bitbucket.

I could see myself using Rust in some projects in the future, though I'm not
likely to stop using Python for the vast majority of applications. The Rust
language itself is changing week to week and is probably unsuitable for any
production development at this time, but for early adopters it's well worth a
look.

Delivering high quality code stands on two pillars: the developer's wisdom to
write code well, and tools to inform and guide the developer towards better
practice. Developers are clever, and will make poor tools work, but the
benefits of great tools go beyond making the developers' lives easier, and
actively promote higher quality code.

Here are my picks for sharp tools that improve not developer productivity but
code quality.

Version Control Hosting

Going beyond just the benefits of keeping code in version control, tools like
Rhodecode or Gitorious (both self-hostable) or Github or Bitbucket (SaaS)
allow developers to create new repositories so that unwieldy projects can be
split, or new tools and supporting apps can be kept disentangled from the
existing code.

You really don't want developers to be bound by the architectural decisions
made long ago and codified in pre-created repositories that are hard to get
changed.

Code Review

The best code review tools let you show uncommitted changes to other
developers, provide high-quality diffs that make it easy to read and understand
the impact of a change, and let the other developers give detailed feedback on
multiple sections of code. With this feedback developers can rapidly turn
patches around and resubmit until they are perfect. Pre-commit review means that
the committed history doesn't contain unnecessary clutter; each commit will do
exactly one thing in as good code as can be achieved.

Code review can catch issues such as potential error cases, security
weaknesses, duplicated code or missing tests or documentation. However the
benefits of code review go far beyond the direct ability to catch problem code.
Once working with code review, developers that to get their code through review
they should adapt their coding style to be clearer and more legible, and
pre-empt the criticisms that will be levelled by the reviewers. Code review
also facilitates mutual learning: more people pay more attention to the new
features that go into the codebase, and so understand the codebase better; also
inexperienced developers get guidance from the more experienced developers
about how their code could be improved.

Some hosted version control systems (eg. Github) have code review built in, or
there are self-hosted tools such as ReviewBoard or SaaS tools like Atlassian
Crucible.

Linters/Code Style checkers

Thee earliest time you can get feedback about code quality to developers is
when the code is being edited. (If you're not a Pythonista, you'll have to
translate this to your own language of choice.)

Linters like Pyflakes can be run in the editor to highlight potential
problems, while style checkers like pep8.py highlight coding style
violations. Many IDEs will ship with something like this, but if yours doesn't
then plugins are usually available.

Pyflakes is good at spotting undeclared and unused variables, and produces
relatively few false positives; on the occasions I've tried PyLint I found it
pedantic and plain wrong whenever anything vaguely magical happens. You can
tailor it back with some configuration but in my opinion it's not worth it.

pep8.py is valuable and worth adopting, even if your own coding style is
different (though don't change if your project already has a consistent style).
The style promoted by pep8 is pleasantly spaced and comfortable to read, and
offers a common standard across the Python community. I've found even the
controversial 80-column line length limit useful - long lines are less
readable, both when coding and when viewing side-by-side diffs in code review
or three-way diff tools.

You might also consider docstring coverage checkers (though I've not seen one
integrated with an editor yet). I find docstrings invaluable for commenting the
intention that the developer had when they wrote the code, so that if you're
debugging some strange issue later you can identify bits of code that don't do
what the developer thought they did.

With Python's ast module it isn't hard to write a checker for the kind of
bad practice that comes up in your own project.

Test Fixture Injection

Test code has a tendency to sprawl, with some developers happy to
copy-and-paste code into dozens of test cases, suites and modules. Big test
code becomes slow, unloved and hard to maintain. Of course, you can criticise
these practices in code review, but it's an uphill challenge unless you can
provide really good alternatives.

The kind of test fixtures your application will need will of course depend on
your problem domain, but regardless of your requirements it's worth considering
how developers can create the data their tests will depend on easily and
concisely - without code duplication.

However where these are not appropriate, it can be valuable to write the tools
you need to make succinct, easily maintained tests. In our project we populate
our object database using test objects loaded from YAML. You could also do this
in-memory objects if the code required to create them is more complicated or
slower than just describing the state they will have when created.

Another approach also in use in our project is to create a DSL that allows
custom objects to be created succinctly. A core type in our project is an
in-memory tabular structure. Creating and populating these requires a few lines
of code, but for tests where tables are created statically rather than
procedurally we construct them by parsing a triple-quoted string of the form:

Pyweek 18 was announced last week, to run from the 11th May to 18th May
2014, midnight to midnight (UTC).

Pyweek is a bi-annual games programming contest in which teams or individuals
compete to develop a game, in Python, from scratch, in exactly one week, on a
theme that is selected by vote and announced at the moment the contest
starts.

The contest offers the opportunity to program alongside other Python
programmers on a level playing field, with teams diarising their progress via
the pyweek.org site, as well as chatting on IRC (#pyweek on Freenode).

Games are scored by other entrants, on criteria of fun, production and
innovation, and it's a hectic juggling act to achieve all three in the limited
time available.

It's open to all, and very beginner friendly. You don't need a team, you don't
need finely honed artistic ability, and you don't need to set aside the whole
week - winning games have been created in less than a day. I'd encourage
you to take part: it's a great opportunity to explore your creative potential
and learn something new.

I can access lemon, but I didn't explicitly import it. Of course, this
happens because the import lemon.sherbet line ultimately puts the
lemon module into my current namespace.

I can also access lemon.curd without explicitly importing it. This is
simply because the module structure is stateful. Something else assigned
the lemon.curd module to the name curd in the lemon module. I've
imported lemon, so I can access lemon.curd.

I'm inclined to the view that relying on either of these quirks would be
relatively bad practice, resulting in more fragile code, so it's useful to be
aware of them.

The former of these quirks also affects Pyflakes. Pyflakes highlights in my
IDE variables that I haven't declared. But it fails to spot obvious mistakes
like this:

importlemon.sherbetprint(lemon.soda)

which when run will produce an error:

AttributeError: 'module' object has no attribute 'soda'

There's still nothing mysterious about this; Pyflakes only sees that lemon
is defined, and has no idea whether lemon.soda is a thing.

I think the reason that this breaks in my mind is due to a problem of leaky
abstraction in my working mental models. I tend to think of the source tree as
a static tree of declarative code, parts of which I can map into the current
namespace to use. It isn't this though; it is an in-memory structure being
built lazily. And it isn't mapped it into a namespace, the namespace just gets
the top level names and my code traverses through the structure.

Maybe I formed my mental models long ago when I used to program more Java,
where the import statement does work rather more like I've described. I wonder
if people with no experience of Java are less inclined to think of it like I
do?

A lot of the software I've written has never been through any formal design
process. Especially with Python, because of the power of the language to let me
quickly adapt and evolve a program, I have often simply jumped in to writing
code without thinking holistically about the architecture of what I'm writing.
My expectation that a good architecture will emerge, at least for the parts
where it matters.

This approach may work well if you are programming alone, but is hampered if
you are practicing (unit) test-driven development, or are working in a team.
Unit tests disincentivise you against refactoring components, or at least slows
the process down. I would point out that if unit tests are resolutely hard to
write then your code may be badly architected.

Working as a team reduces your ability to have perfect knowledge of all
components of the system, which would be required to spot useful refactorings.

In practice I've found that if we don't do any up-front design, we won't ever
end up writing great software: some bits will be good, other bits will be
driven by expedience and stink, and won't get refactored, and will be a blight
on the project for longer than anyone expected.

The technique is simple: get the team in a room, write down suggested classes
in a system on index cards on a table, then iterate and adapt the cards until
the design looks "good". Each card is titled with the name of the class, a list
of the responsibilities of the class, and a list of the other classes with
which the class will collaborate. The cards can be laid out so as to convey
structure, and perhaps differently coloured cards might have different
semantics.

CRC cards are founded on object-oriented principles, and I don't want our code
to be unnecessarily objecty, so I'm quick to point out that not every card will
correspond to a Python class. A card may also correspond to a function, a
module, or an implied schema for some Python datastructure (eg. a contract on
what keys will be present in a dict). I think of them as
Component-responsibility-collaboration cards. The rules are deliberately
loose. For example, there's no definition of what is "good" or how to run the
session.

Running a CRC design session is perhaps the key art, and one that I can't claim
to have mastered. Alistair Cockburn suggests considering specific scenarios to
evaluate a design. In CRC sessions I've done I've tried to get the existing
domain knowledge written down at the start of the session. If there's an
existing architecture, write that down first. That's an excellent way to start,
because then you just need to refactor and extend it. You could also write down
fixed points that you can't or don't want to change right now, perhaps on
differently colour cards.

It does seem to be difficult to get everyone involved in working on the cards.
Your first CRC session might see people struggling to understand the "rules",
much less contribute. Though it harks back to the kind of textbook OO design
problems that you encounter in early university courses, even experienced
developers may be rusty at formal software design. However, once you get people
contributing, CRC allows the more experienced software engineers to mentor the
less experienced team members by sharing the kind of rationale they are
implicitly using when they write software a certain way.

I think you probably need to be methodical about working through the design,
and open about your gut reactions to certain bits of design. Software
architecture involves a lot of mental pattern matching as you compare the
design on the table to things that have worked well (or not) in the past, so it
can be difficult to justify why you think a particular design smells. So speak
your mind and suggest alternatives that somehow seem cleaner.

The outcome of a CRC design session is a stack of index cards that represent
things to build. With the design fixed, the building of these components seems
to be easier. Everyone involved in the session is clear on what the design is,
and a summary of the spec is on the card so less refactoring is needed.

I've also found the components are easier to test, because
indirection/abstraction gets added in the CRC session than you might not add if
you were directly programming your way towards a solution. For example, during
design someone might say "We could make this feature a new class, and allow for
alternative implementations". These suggestions are added for the elegance and
extensibility of their design, but this naturally offers easier mock dependency
injection (superior to mock.patch() calls any day).

CRC cards seem to a cheap way to improve the quality of our software. Several
weeks' work might be covered in an hour's session. We've not used CRC as often
as we could have, but where we have I'm pleased with the results: our software
is cleaner, and working on cleaner software makes me a happier programmer.

I'd like to close 2013 with a retrospective of the year and some thoughts on
what I'd like to achieve in 2014.

Vertu

In March 2013 I decided to leave my contract at luxury phone manufacturer
Vertu and take up a contract at Bank of America Merrill Lynch. The two years
I spent at Vertu spanned the period where they separated from Nokia and were
sold. As part of this separation I was involved in putting in place
contemporary devops practices, datacentres, development tools and CI, and
leading a team to build exclusive web apps and web services. We got to play
with cool new technologies and turn them to our advantage, to deliver, fast.

For example, I spent January and February developing a new version of Vertu's
lifestyle magazine Vertu Life using Django. Using ElasticSearch instead of
Django's ORM was a great choice: I was not only able to build strong search
features but get more value out of the content by adding "More like this"
suggestions in many pages. Though Vertu Life is just a magazine, the site
allows some personalisation. All such writes went to Redis, so the site was
blazingly fast.

Bank of America Merrill Lynch

Joining Bank of America meant moving from Reading to London, and I handed over
duties as the convenor of the Reading Python Dojo to Mark East (who has
since also joined Bank of America, coincidentally).

Bank of America's big Python project Quartz is a Platform-as-a-Service for
writing desktop banking apps and server-side batch jobs, and I joined a team
maintaining some of the Quartz reconciliation technology components. Quartz is
a complex platform with a lot of proprietary components, and it all seems very
alien to software developers until you start to understand the philosophy
behind it better.

This was an interesting project to join because it was a somewhat established
application with reams of what everyone likes to call "legacy code". Coming
into this, I had to learn a lot about how the code works and how Quartz works
before being able to spot ways to improve this.

Banking is also a very technical industry and this also presents challenges
around communication between bankers and software engineers like me. Agile
adoption is in its infancy at Bank of America, but has buy in at the senior
management level, which is exciting and challenging.

Quartz is not only a project; it's a large internal community (2000+
developers), so the challenges we face are not just technical but social and
political. I've learned that collaboration in a project the size of Quartz
requires putting more effort in communication than smaller projects. The
natural tendancy is towards towards siloisation and fragmentation. We have got
better about doing things in a way that they could be more easily re-used, then
talking and blogging about them.

Devopsdays

There were Devopsdays conferences in London in March and November, and I look
forward to more in 2014. As well as talks covering technical approaches to
improving software development and operations, and talks on how to improve
cross-business collaboration, Devopsdays offers plenty of opportunities to
network, to discuss problems you are tackling and share experiences about
approaches that have worked and have not.

Europython

Though I'm excited about going to Berlin in 2014, I'm very sorry that Europthon
2013 was the last in Florence. Florence is full of beautiful art and
architecture but is also a place to relax in the sunshine with great food and
great company, and talk about interesting things (not least, Python, of
course).

After two years of lurking at Europython, this year I was organised enough to
offer a talk on Programming physics games with Python and OpenGL. People have
told me this was well received, though I think I could do with practice at
giving talks :)

After Europython, I took a week driving around Tuscany with my girlfriend.
Tuscany is beautiful, both the Sienese hill towns and the Mediterranean beach
resorts, and the food and wine is excellent. I recommend it. Though perhaps I
wouldn't drive my own car down from London again. Italy is a long way to drive.

Pycon UK

At Pycon UK I gave a talk on "Cooking up high quality software", in full chef's
whites and in my best dodgy french accent. Hopefully my audience found this
humorous and perhaps a little bit insightful. I was talking exclusively in
metaphors - well, puns - but I hope some people took away some messages.

I think if I had to sum up those messages I was encouraging developers to think
beyond just the skills involved in cooking a dish, but the broader picture of
how the kitchen is organised and indeed, everything else that goes on in the
restaurant.

Several of the questions were about my assertion that the "perfect dish"
requires choosing exactly the right ingredients - and may involve leaving some
ingredients out. I was asked if I mean that we should really leave features
out. Certainly I do; I think the key to scalable software development is in
mitigating complexity and that requires a whole slew of techniques, including
leaving features out.

Pycon UK was also notable for the strong education track, which we at Bank of
America sponsored, and which invited children and teachers to come in and work
alongside developers for mutual education.

PyWeek

PyWeek is a week-long Python games programming contest that I have been
entering regularly for the last few years.

This year I entered both the April and the September PyWeek with Arnav Khare,
who was a colleague at Vertu.

Our entry in PyWeek 16 (in April) was Warlocks, a simple 2D game with a
home-rolled 3D engine and lighting effects. I was pleased with achieving a
fully 3D game with contemporary shaders in the week, but we spent too much time
on graphical effects and the actual game was very shallow indeed, a simple
button-mashing affair where two wizards face each other before hurling a small
list of particle-based spells at each other.

I was much happier with out PyWeek 17 entry, Moonbase Apollo, which was a
deliberately less ambitious idea. We wanted to add a campaign element to a game
that was a cross between Asteroids and Gravitar. A simple space game is easy to
write and doesn't require very much artwork. It was a strategy that allows us
to have the bulk of the game mechanics written on day 1, so we had the rest of
the week to improve production values and add missions.

We were relatively happy with the scores we got for these but neither was a
podium finish :-(

2014

So what will I get up to in 2014?

I'm keen to do more Python 3. Alex Gaynor has blogged about lack of Python
3 adoption and I regret that I haven't done much to move towards using Python 3
in my day-to-day coding this year. Bank of America is stuck on Python 2.6. I
still feel that Python 3 is the way forward, perhaps now more than ever, now
that Django runs under Python 3, but I tend to pick Python 2 by default. I did
consider opting for Python 3 as our core technology when the decision arose at
Vertu, but at that time some of the libraries we really needed were not
available on Python 3. So I chose the safe choice. I think today, I might chose
differently.

This is a write up of a talk I originally gave at DevopsDays London in March
2013. I had a lot of positive comments about it, and people have asked me
repeatedly to write it up.

Background

At a previous contract, my client had over the course of a few years outsourced
quite a handful of services under many different domains. Our task was to move
the previously outsourced services into our own datacentre as both a cost saving
exercise and to recover flexibility that had been lost.

In moving all these services around, there evolved a load balancer
configuration that consisted of

Some hardware load balancers managed by the datacentre provider that mapped
ports and also unwrapped SSL for a number of the domains. These were
inflexible and couldn't cope with the number of domains and certificates we
needed to manage.

Puppet-managed software load balancers running

Stunnel to unwrap SSL

HAProxy as a primary load balancer

nginx as a temporary measure for service migration, for example, dark launch

As you can imagine there were a lot of moving parts in this system, and
something inevitably broke.

In our case, an innocuous-looking change passed through code review that broke
transmission of accurate X-Forwarded-For headers. The access control to
some of our services was relaxed for certain IP ranges as transmitted with
X-Forwarded-For headers. Only a couple of days after the change went in we
found the Googlebot had spidered some of our internal wiki pages! Not good! The
lesson is obvious and important: you must write tests of your
infrastructure.

Unit testing a load balancer

A load balancer is a network service that forwards incoming requests to one of
a number of backend services:

A pattern for unit testing is to substitute mock implementations for all
components of a system except the unit under test. We can then verify the
outputs for a range of given inputs.

To be able to unit test the Puppet recipes for the load balancers, we need to
be able to create "mock" network services on arbitrary IPs and ports that the
load balancer will communicate with, and which can respond with enough
information for the test to check that the load balancer has forwarded each
incoming request to the right host with the right headers included.

The first incarnation of tests was clumsy. It would spin up dozens of network
interface aliases with various IPs, put a webservice behind those, then run the
tests against the mock webservice. The most serious problem with this approach
was that it required slight load balancer configuration changes so that the new
services could come up cleanly. It also required tests to run as root to create
the interface aliases and bind the low port numbers required. It was also slow.
It only mocked the happy path, so tests could hit real services if there were
problems with the load balancer configuration.

I spent some time researching whether it would be possible to run these mock
network services without significantly altering the network stack of the
machine under test. Was there any tooling around using promiscuous mode
interfaces, perhaps? I soon discovered libdnet and from there Honeyd, and
realised this would do exactly what I needed.

Mock services with honeyd

Honeyd is a service intended to create virtual servers on a network, which can
respond to TCP requests etc, for network intrustion detection. It does all this
by using promiscuous mode networking and raw sockets, so that it doesn't
require changes to the host's real application-level network stack at all. The
honeyd literature also pointed me in the direction of combining honeyd with
farpd so that the mock servers can respond to ARP requests.

More complicated was that I needed to create scripts to create mock TCP
services. I needed my mock services to send back HTTP headers, IPs, ports and
SSL details so that the test could verify these were as expect. To create
service Honeyd requires you to write programs that communicate on stdin and
stdout as if these were the network socket (this is similar to inetd). While it
is easy to write this for HTTP and a generic TCP socket, it's harder for HTTPS,
as the SSL libraries will only wrap a single bi-directional file descriptor. I
couldn't find a way of treating stdin and stdout as a single file descriptor. I
eventually solved this by wrapping one end of a pipe with SSL and proxying the
other end of the pipe to stdin and stdout. If anyone knows of a better solution
for this, please let me know.

With these in place, I was able to create a honeyd configuration that reflected
our real network:

This was all coupled an interface created with the Linux veth network
driver (after trying a few other mock networking devices that didn't work).
With Debian's ifup hooks, I was able to arrange it so that bringing up this
network interface would start honeyd and farpd and configure routes so that the
honeynet would be seen in prefence to the real network. There is a little
subtlety in this, because we needed the real DNS servers to be visible, as the
load balancer requires DNS to work. Running ifdown would restore everything
to normal.

Writing the tests

The tests were then fairly simple BDD tests against the mocked load balancer,
for example:

The honeyd backend is flexible and fast. Of course it was all Puppetised as a
single Puppet module that added all the test support; the load balancer recipe
was applied unmodified. While I set up as a virtual network device for use on
development loadbalancer VMs, you could also deploy it on a real network, for
example for continuous integration tests or for testing hardware network
devices.

As I mentioned in my previous post, having written BDD tests like this it's
easier to reason about the system, so the tests don't just catch errors
(protecting against losing our vital X-Forwarded-For headers) but give an
overview of the load balancer's functions that makes it easier to understand
and adapt in a test-first way as services migrate. We were able to
make changes faster and more confidently and ultimately complete the migration
project swiftly and successfully.

What is BDD?

Behaviour Driven Development (BDD) is a practice where developers
collaborate with business stakeholders to develop executable specifications for
pieces of development that they are about to start. Test Driven Development
(TDD) says that tests should be written before development, but it doesn't say
how tests should be written.

BDD builds on TDD by proposing that the first tests should be
functional/acceptance tests written in business-oriented language. Using a
business-oriented language rather than code allows stakeholders to be involved
in verifying that a feature satisfy the business' requirements before work on
that feature even commences. You might then do TDD at the unit test level
around individual components as you develop.

Gherkin

The tools for BDD have generally come to revolve around Gherkin, a simple
structure for natural language specifications.

My favourite description of Gherkin-based tools is given by the Ruby
Cucumber website:

Describe behaviour in plain text

Write a step definition

Run Cucumber and watch it fail

Write code to make the step pass

Run Cucumber again and see the step pass

To summarise, a feature might be described in syntax like:

Feature: Fight or flight In order to increase the ninja survival rate, As a ninja commander I want my ninjas to decide whether to take on an opponent based on their skill levelsScenario: Weaker opponent Given the ninja has a third level black-beltWhen attacked by a samuraiThen the ninja should engage the opponent

You then write code that binds this specification language to a test
implementation. Thus the natural language becomes a functional test.

Python Tools

Of these I've only had experience with lettuce (we hacked up an internal fork
of lettuce with HTML and modified xUnit test output), but outwardly they are
similar.

Experiences of implementing BDD

A complaint I've heard a couple of times about BDD as a methodology is that it
remains difficult to get the business stakeholders to collaborate in writing or
reviewing BDD tests. This was my experience too, though there is a slightly
weaker proposition of Specification by Example where the stakeholders are
asked just to provide example cases for the developers to turn into tests. This
doesn't imply the same bi-directionality and collaboration as BDD.

If you don't get collboration with your stakeholders there are still benefits
to be had from BDD techniques if you put yourself in the shoes of the
stakeholder and develop the BDD tests you would want to see. It gives you the
ability later to step back and see the software at a higher level than as a
collection of tested components. You may find this level is easier to reason
at, especially for new starters and new team members.

Another complaint is that it seems like it's more work, with the two-step
process - first write natural language, then work out how to implement those
tests - but in fact, I found it makes it much easier to write tests in the
first place. Where in TDD you have to start by thinking what the API looks
like, in BDD you start with a simple definition of what you want to see
happening. You soon build up a language that completely covers your application
domain and the programming work required in creating new tests continues to
drop.

Another positive observation is that the three tiers give your tests are
protected from inadvertant change as the project developers. While your code
might change, and the corresponding specification language code bindings might
change, well-written Gherkin features will not need to change. Without using
BDD I have encountered situations where functionality was broken because the
tests that would have caught it were changed or removed at the same time that
the implementation was changed. BDD protects against that.

The natural language syntax is helpful at ensuring that tests are written at a
functional level. Writing tests in natural language makes it much more visible
when you're getting too much into implementational detail, as you start to
require weirdly detailed language and language that the business users would
not understand.

Pitfalls

There are a couple of pitfalls that I encountered. One is just that the
business stakeholders won't be good at writing tests, and so the challenge of
collaborating to develop the BDD tests is hard to solve. Just writing something
in natural language isn't enough, you need to get on the path of writing tests
that take advantage of existing code bindings and that are eminently testable
scenarios.

Another pitfall was that you need to ensure that the lines of natural language
really are implemented in the code binding by a piece of code that does what it
says. Occasionally I saw code that tested not the desired function, but some
proxy: assuming that if x, then y, let's test x, because it's easier to
test. You really really need to test y, or the BDD tests will erroneously
pass when that assumption breaks.