Blog

Me

Links

After many years of separation I was recently reunited with the venerable old FTP protocol. The years haven’t been kind to it.

Happy New Year!

Right, that’s the jollity out of the way.

I recently had cause to have some dealings with the File Transfer Protocol, which is something I honestly never thought I’d be using again. During the process I was reminded what an antiquated protocol it is. It’s old, it has a lot of wrinkles and, frankly, it’s really starting to smell.

But perhaps I’m getting ahead of myself, what is FTP anyway?

It’s been around for decades, but in these days of cloud storage and everything
being web-based, it’s not something that people typically come across very
often. This article is really an effort to convince you that this is something
for which we should all be thankful.

Wikipedia says of FTP:

The File Transfer Protocol (FTP) is the standard network protocol used for the transfer of computer files between a client and server on a computer network.

Twenty years ago, perhaps this was even true, but this Wikipedia page was last edited a month ago! Surely the world has moved on these days? At least I certainly hope so, and this post is my attempt to explain some of the reasons why.

A Little Piece of History

“What’s so bad about FTP?” I hear you cry! Oh wait, my mistake, it wasn’t you but the painfully transparent literary device sitting two rows back. Well, regardless, I’m glad you asked. First, however, a tiny bit of history and a brief explanation of how FTP works.

FTP is a protocol with a long history, and even predates TCP/IP. It first put in an appearance in 1971 in RFC 114, a standard so old it wasn’t even put into machine-readable form until 2001. At this point it was built on the Network Control Program (NCP), which was a unidirectional precursor of TCP/IP. The simplex nature of NCP may well explain some of FTP’s quirks, but I’m getting ahead of myself.

In 1980 the first version of what we’d recognise as FTP today was defined in RFC 765. In this version the client opens a TCP connection (thereafter known as the command connection) to port 21 on a server. It then sends requests to transfer files across this connection, but the file data itself is transferred across a separate TCP connection, the data connection. This is the main aspect of FTP which doesn’t play well with modern network topologies as we’ll find out later.

Given that TCP connecitons are full-duplex, why didn’t they take the opportunity to remove the need for a second connection when they moved off NCP? Well, the clues are in RFC 327, from a time when people were still happy to burn RFC numbers for the minutes of random meetings. I won’t rehash it here, but suffice to say it was a different time and the designers of the protocol had very different considerations.

Whatever the reasons, once the command connection is open and a transfer is requested, the server connects back to the client machine on a TCP port specified by the FTPPORT command. This is known as active mode. Once this connection is established, the sending end can throw data down this connection.

Even back in 1980 they anticipated that this strategy may not always be ideal, however, so they also added a PASV command to use passive mode instead. In this mode, the server passively listens on a port and sends its IP address and port to the client. The client then makes a second outgoing connection to this point and thus the data connection is formed. This works a lot better than active mode when you’re behind a firewall, or especially a NAT gateway. As NAT gateways became more popular, as the IPv4 address space became increasingly crowded, this form of FTP transfer became more or less entirely dominant.

There were a few later revisions of the RFC to tighten up some of the definitions and provide more clarity. There was a final change that is relevant to this article, however, which was made in 1998 when adding IPv6 support to the protocol, as part of RFC 2428. One change this made was to add the EPSV command to enter extended passive mode. The intended use of this was to work around the fact the original protocol was tied to using 4-byte addresses, and they couldn’t change this without breaking existing clients. As a simple change the EPSV command simply removes the IP address that the server sends to the client for PASV and instead the client uses the same address as it used to create the command connection1.

Not only is extended passive mode great for IPv6, it also works in the increasingly common case where the server is behind a NAT gateway. This causes problems with standard passive mode because the FTP server doesn’t necessarily know its own external IP address, and hence typically sends a response to the client asking it to connect to an address in a private range which, unsurprisingly, doesn’t work2.

It’s important to note that EPSV mode isn’t the only solution to the NATed server problem—some FTP servers allow the external address they send to be configured instead of the server simply using the local address. There are still some pitfalls to this approach, which I’ll mention later.

Simple Enough?

Given all that what, then, are the problems with FTP?

Well, some of them we’ve covered already, in that it’s quite awkward to run FTP through any kind of firewall or NAT gateway. Active mode requires the client to be able to accept incoming connections to an arbitrary port, which is typically painful as most gateways are built on the assumption of outwards connections only and require fiddly configuration to support inbound.

Passive mode makes life easier for the client, but for security-conscious administrators it can be frustrating to have to enable a large range of ports on which to allow outbound connections. It’s also more painful for the server due to the dynamic ports involved, as we’ve already touched on. The server can’t use only a single port for its data connections since that would only allow it to support a single client concurrently. This is because the port number is the only think linking the command and data connections—if two clients opened data connections at the same time, the server would have no other way to tell them apart.

Extended passive mode makes like easier all round, as long as you can live with opening the largish range of ports required. But even given all this there’s still one major issue which I haven’t yet mentioned, which crops up with the way that modern networks tend to be architected.

FTP = Forget Talking to Pools

Anyone who’s familiar with architecting resilient systems will know that servers are often organised into clusters. This makes it simple to tolerate failures of a particular system, and is also the only practical way to handle more load than a single server can tolerate.

When you have a cluster of servers, it’s important to find a way to direct incoming connections to the right machine in the cluster. One way to do this is with a hardware load balancer, but a simpler approach is simply use DNS. In this approach you have a domain name which resolves to multiple IP addresses, sorted into a random order each time, and each address represents one member of the pool. As clients connect they’ll typically use the first address and hence incoming connections will tend to be balanced across available servers.

This works really well for protocols like HTTP which are stateless because every time the client connects back in it doesn’t matter which of the servers it gets connected to, any of them are equally capable of handling any request. If a server gets overloaded or gets taken down for maintenance, the DNS record is updated and no new connections go to it. Simple.

This approach works fine for making the FTP command connection. However, when it comes to something that requires a data connection (e.g. transferring a file), things are not necessarily so rosy. In some cases it might work fine, but it’s a lot more dependent on network

Let’s illustrate a potential problem with an example. Let’s say there’s a public FTP site that’s served with a cluster of three machines, and those have public IP addresses 100.1.1.1, 100.2.2.2 and 100.3.3.3. These are hidden behind the hostname ftp.example.com which will resolve to all three addresses. This can either be in the form of returning multiple A records in one response, or returning different addresses each time. We can see examples of both of these if we look at the DNS records for Facebook:

When the FTP client initiates a connection to ftp.example.com it first performs a DNS lookup—let’s say that it gets address 100.1.1.1. It then connects to 100.1.1.1:21 to form the command connection. Let’s say the FTP client and server are both well behaved and then negotiate the recommended EPSV mode, and the server returns port 12345 for the client to connect on.

At this point the client must make a new connection to the specified port. Since it needs to reuse the original address it connected to, let’s say that it repeats the DNS lookup and this time gets IP address 100.2.2.2 and so makes its outgoing data connection to that address. However, since that’s a physically separate server it won’t be listening on port 12345 and the data connection will fail.

OK, so you can argue that’s a broken FTP client—instead of repeating the DNS lookup it could just reconnect to the same address it got last time. However, in the case where you’re connecting through a proxy then this is much less clear cut—the proxy server is going to have no way to know that the two connections that the FTP client is making through it should go to the same IP address, and so it’s more than likely to repeat the DNS resolution and risk resolving to a different IP address as a result. This is particularly likely for sites using DNS for load-balancing since they’re very likely to have set a very short TTL to prevent DNS caches from spoiling the effect.

We could use regular passive mode to work around the inconsistent DNS problem,
because the FTP server returns its IP address explicitly. However, this could
still cause an issue with the proxy if it’s whitelisting outgoing
connections—we would likely have just included the domain name in the
whitelist, so the IP address would be blocked. Leaving that issue aside, there’s
still another potential pitfall if the FTP server has had the public IP address to return configured by an administrator. If that administrator has configured this via a domain name, the FTP server itself could attempt to resolve the the name and get the wrong IP address, so actually instruct the client to connect back incorrectly. Each server could be configured with its external IP address directly, but this is going to make centralised configuration management quite painful.

Insecurity

As well as all the potential connectivity issues, FTP also suffers from a pretty poor security model. This is fairly well known and there’s even an RFC discussing many of the issues.

One of the most fundamental weaknesses is that it involves sending the username and password in plaintext across the channel. One easy way to solve this is to tunnel the FTP connection over something more secure, such as an SSL connection. This setup, usually known as FTPS, works fairly well, but still suffers from the same issues around the separate data and command connections. Another alternative is to tunnel FTP connections over SSH.

None of these options should be confused with SFTP which, despite the similarity in name, is a completely different protocol developed by the IETF3. It’s also different from SCP, just for extra confusion4. This protocol assumes only a previously authenticated and secure channel, so is applicable over SSH but more generally anywhere where a secure connection has been created.

Overall, then, I strongly recommend sticking to SFTP wherever you can, as the world of FTP is, as we’ve seen, by and large a world of pain if you care about more or less any aspect of security at all, or indeed ability to work in any but the most trivial network architectures.

In conclusion, then, I think that far from FTP being the “standard” network protocol used for the transfer of computer files, we should instead be hammering the last few nails in its coffin and putting it out of our misery.

I guess by 1998 they’d given up on those crazy ideas from the 80’s of transferring between remote systems without taking a local copy—you know, the thing that absolutely nobody ever used ever. I wonder why they dropped it? ↩

Even with extended passive mode NAT can still cause problems, as you also need to redirect the full port range that you plan to use for data connections to the right server. It solves part of the problem, however. ↩

Interestingly there doesn’t appear to be any kind of RFC for SFTP, but instead just a draft. I find this rather odd considering how widely used it is! ↩

Just for extra bonus confusion there’s a really old protocol called the Simple File Transfer Protocol defined in RFC 913 which could also reasonably be called “SFTP”. But it never really caught on so probably this isn’t likely to cause confusion unless some pedantic sod reminds everyone about it in a blog post or similar. ↩

I write most of my blog articles and make other changes to my site whilst on my daily commute. The limitations of poor network reception different hardware have forced me to come up with a streamlined process for it and I thought it might be helpful to share in case it’s helpful for anyone else.

I like writing. Since software is what I know, I tend to write about that. QED.

Like many people, however, my time is somewhat pressured these days — between
a wife and energetic four-year-old daughter at home and my responsibilities at
work, there isn’t a great deal of time left for me to pursue my own interests.
When your time is squeezed the moments that remain become a precious commodity
that must be protected and maximimsed.

Most of my free time these days is spent on the train between Cambridge and
London. While it doesn’t quite make it into my all time top ten favourite
places to be, it’s not actually too bad — I almost invariably get a seat,
usually with a table, and there’s patchy mobile reception along the route.
Plenty of opportunties for productivity, therefore, if you’re prepared to
take them.

Since time is precious, the last thing I want to do when maintaining my blog,
therefore, is spend ages churning out tedious boiler-plate HTML, or waiting for
an SSH connection to catch up with the last twenty keypresses as I hit a
reception blackspot. Fortunately it’s quite possible to set things up to
avoid these issues and this post is a rather rambling discussion of things
I’ve set up to mitigate them.

Authoring

The first time-saving tool I use is Pelican. This is a source code
generator which processes Markdown source files and generates static HTML from
them according to a series of Jinja templates.

When first resurrecting my blog from
a cringeworthy earlier effort1 the first thing
I had to decide was whether to use some existing blogging platform
(Wordpress, Tumblr, Medium, etc.) either
self-hosted or otherwise. The alternative I’d always chosen previously was to
roll my own web app — the last one being in Python using
CherryPy — but I quickly ruled out that option. If the point was
to save time, writing my own CMS from scratch probably wasn’t quite the optimal
way to go about it.

Also, the thought of chucking large amounts of text into
some clunky old relational database always fills me with a mild sense of
revulsion. It’s one of those solutions that only exists because if all
you’ve got is a hosted MySQL instance, everything looks like a BLOB.

In the end I also rejected the hosted solutions. I’m sure they work very well,
with all sorts of mobile apps and all such mod cons, but part of the point of
all this for me has always been the opportunity to keep my web design
skills, meagre as they might be, in some sort of barely functional state.
I’m also enough of a control freak to want to keep ownership of my content
and make my own arrangements for backing it up and such — who knows when
these providers will disappear into the aether.

What I was really tempted to do for awhile was build something that was like
a wiki engine but which rendered with appropriate styling like a standard
website — it was the allure of using some lightweight markup that really
appealed to me. At that point I discovered Pelican and suddenly I realised
with this simple tool I could throw all my Markdown sources into a Git
repository and then throw it through Pelican2 to generate the site.
Perhaps I’m crazy but it felt like a tool for storing versioned text files
might be a far more appropriate tool than a relational database for, you
know, storing versioned text files. Just like a wiki, but without the
online editing3.

All there was to do then was build my own Pelican template,
set up nginx to serve the whole lot and I was good to go. Simple enough.

Updating the site

Except, of course, that getting site generated was only half the battle. I
could SSH into my little VPS, write some article in Markdown using Vim and
then run Pelican to generate it. That’s great when I’m sitting at home on
a nice, fast wifi connection — but when I’m sitting at home I’m generally
either spending time with my family or wading through that massive list of
things that are way lower on the fun scale than blogging, but significantly
higher on the “your house will be an uninhabitable pit of utter filth
and despair” scale.4

When I’m sitting on a train where the mobile reception varies between
non-existent and approximately equivalent to a damp piece of string,
however, remote editing is a recipe for extreme frustration and a string
of incoherently muttered expletives every few minutes. Since I don’t like
to be a source of annoyance to other passengers, it was my civic duty to
do better.

Fortunately this was quite easy to arrange. Since I was already using a Git
repository to store my blog, I could just set up a cron job which updated the
repo, checked for any new commits and invoked Pelican to update the site.
This is quite a simple script to write and the cron job
to invoke it is also quite simple:

If you look at check-updates.py you’ll find it just
uses git log -1 --pretty=oneline to grab the ID of the current commit and
compares it to the last time it ran — if there’s any difference, it triggers
a run of Pelican. It has a few other complicating details like allowing
generation in a staging area and doing atomic updates of the destination
directory using a symlink to avoid a brief outage during the update, but
essentially it’s doing a very simple job.

This was now great — I could clone my blog’s repo on to my laptop, perform
local edits to the files, run a staging build with a local browser to confirm
them and then push the changes back to the repo during brief periods of
connectivity. Every five minutes my VPS would check for updates to the repo
and regenerate the site as required. Perfect.

There’s an app for that

Well, not quite perfect as it turns out. While travelling with a laptop it
was easy to find a Git client, SSH client and text editor, but sometimes I
travel with just my iPad and a small keyboard and things were a little trickier.

However, I’ve finally discovered a handful of apps that have streamlined this process:

Since I put Git at the heart of my workflow it was always disappointing
that it took so long for a decent Git client to arrive on iOS. Fortunately
we now have Working Copy and it was worth the wait. Whilst
unsurprisingly lacking some of the more advanced functionality of
command-line Git, it’s effective and quite polished and does the job
rather nicely. It has a basic text editor built in, but one of its main
benefits is that it exposes the working directory to other applications
which allows me to choose something a little more full-featured.

This is the editor I currently use on both iOS and Mac. It’s packed with
features and can open files from Working Copy as well as supporting
direct SFTP access and other mechanisms. I won’t go through it’s myriad
features, just suffice to say it’s very capable. I should give an
honourable mention to Coda for iOS, Panic Inc.’s
extremely polished beautifully crafted text editor for iOS, which I used
to use. Coda has a builtin SSH client and is really heavily optimised
for remote editing, so it’s a great alternative if you want to explore.
The original reason I switched was that, with my unreliable uplink,
Textastic‘s more explicit download/edit/upload model worked a little
better for me than Coda’s more implicit remote editing with caching.
Now the fact that Textastic supports local editing within the
Working Copy repo is also a factor. I’ll also be
totally honest and point out that I haven’t played with Coda since they
released a (free) major update awhile back. I’ve nothing but praise for
its presentation and overall quality, however.

If Coda for iOS didn’t quite tempt me as much as Textastic, another of
Panic’s offerings Prompt 2 is absolutely exactly what I need. This is by
far the most accomplished SSH client I’ve used on iOS bar none. It supports
all the funtionality you need with credentials, plus you can layer Touch ID
on top if you want it to remember your passphrases. Its terminal emulation is
pretty much perfect - I’ve never had any issues with curses or anything else.
It runs multiple
connections effortlessly and keeps them open in the background without issue.
It can even pop up a notification reminder to swap it back to keep your
connections alive if it’s idle for too long. As with any remote access on a
less than perfect link I’d very strongly suggest using tmux, but
Prompt 2 does about all it can to maintain the stability of your connections.

Summary

That’s about the long and the short of it, then. I’ve been very happy with my
Git-driven workflow and found it flexible enough to cope with changes in my
demands and platforms. Any minor deficiencies I can work around with scripting
on the server side.

The nice thing about Git, of course, is that its branching support means that
if I ever wanted to set up, say, a staging area then I can do that with no
changes at all. I just create another commit on the server which uses the staging
branch instead of master, and I’m good to go — no code changes required, except
perhaps some trivial configuration file updates.

Hopefully that’s provided a few useful pointers to someone interesting in
optimising their workflow for sporadic remote access. I was of two minds whether
to even write this article since so much of it is fairly obvious stuff, but
sometimes it’s just useful to have the validation that someone else has made
something work before you embark on it — I’ve done so and can confirm it works
very well.

You may be more familiar with Jekyll, a tool written by Github co-founder Tom Preston-Werner which does the same job. The only reason I chose Pelican was the fact it was written in Python and hence I could easily extend it myself without needing to learn Ruby (not that I wouldn’t like to learn Ruby, given the spare time). ↩

Of course, one could quite reasonably make the point that the online editing is more or less the defining characteristic of a wiki, so perhaps instead of “just like a wiki” I should be saying “almost wholly unlike a wiki but sharing a few minor traits that I happened to find useful, such as generating readable output from a simple markup that’s easier to maintain”, but I prefer to keep my asides small enough to fit in a tweet. Except when they’re talking about asides too large to fit in a tweet — then it gets challenging. ↩

I voted against Brexit as I feel the UK is significantly better within the EU. However, the looming uncertainty over whether the UK will follow through is much worse than either option.

On Thursday 23 June the United Kingdom
held a referendum to decide whether
to remain within the European Union, of which it has been a member since 1973.
The vote was to leave by a majority of 52% with a fairly substantial turnout
of almost 72%. Not the largest majority, but a difference of over a million
people can’t be called ambiguous.

So that was it then — we were out. Time to start comparing people’s plans for
making it happen to decide which was the best.

Except, of course, it turned out
nobody really had any plans. The result
seemed to have been a bit of a shock to everyone, including all the politicians
who were campaigning for it. Nobody really seemed to know what to do next.
Disappointing, but hardly surprising — we’re a rather impulsive nation,
always jumping into things without really figuring out what our end game
should be. Just look at
the shambles that followed the Iraq war.

Fortunately for the Brexiteers there was a bit of a distraction in the form
of David Cameron’s resignation — having campaigned
to remain within the EU he felt that remaining as leader was untenable. Well,
let’s face it, that’s probably disingenuous — what he most likely really felt
was he didn’t want to go down in history as
the Prime Minister who took the country out of the EU, just in case (as many
people think quite likely) it’s a bit of a disaster, quite possibly
resented by generations to come.

This triggered an immediately leadership contest within the Tory party which
drew all eyes for a time, until former Home Secretary Theresa May was
left as the only candidate and assumed leadership
of the party. At this point everyone’s attention seems to be meandering its
way back to thoughts of Brexit and all the questions it raises.

To my mind, however, there’s still one question that supercedes all these when
talking about Brexit — namely, will Br actually exit?

You’d think this was a done deal — I mean, we had a referendum on it and
everything. Usually clears these things right up. But in this case, even well
over a month after the vote, there’s still talk about whether we’re going to
go through with it.

Apparently the legal situation seems quite muddy but
there are possible grounds for a second referendum — although Theresa May
is on record as rejecting that possibility.
I must say I can see her point —
to reject the clearly stated opinion of the British public would need some
pretty robust justification and
“the leave campaigners lied through their teeth” probably
doesn’t really cut it. It’s not like people aren’t used to dealing with
politicians being economical with the truth in general elections.

Then we hear that the House of Lords might try to block
the progress of Brexit — or at least delay it. Once again, it’s not yet
at all clear to what extent this will happen; and if it happens, how
effective it will be; and if it’s effective, how easily the government can
bypass it. For example, the government could try to force it through with
the Parliament Act.

What this all adds up to is very little clarity right now. We have a flurry
of mixed messages coming out of government where they tell us that the one
thing they are 100% certain of is that they’re definitely going to leave the
EU, but not only can’t they give us a plan, they can’t even give us a rough
approximation of when they’ll have a plan;
we have a motley crew of different groups clamouring for
increasingly deperate ways to delay, defer or cancel the whole thing,
but very little certainty on whether they even have the theoretical
grounds to do so let alone the public support to push it through;
and we have an increasingly grumpy EU who are telling
us that if we’re really going to leave then we should jolly well get on
with it and don’t let Article 50 hit us in the bum on the way out.

Meanwhile the rest of the world doesn’t seem to know what to make of it,
so it’s not clear that we’ve seen much of the possible impact, even assuming
we do go ahead. But to think there hasn’t been any impact is misleading
— even when things are uncertain we’ve already seen negative impacts on
academics,
education and
morale in the public sector. Let’s be clear
here, it hasn’t happened yet and it isn’t even a certainty that it will,
and we’re already seeing a torrent of negative sentiment.

To be fair, though, we haven’t yet really had the chance to see any possible
positive aspects of the decision filtering through. In fact, we probably
won’t see any of those until the decision is finalised — or at least
until Article 50 is triggered and there’s a deadline to chivvy everyone along.

That’s a big problem.

I think that the longer this “will they/won’t they” period of
uncertainty carries on, the more we’ll start to see these negative
impacts. Nobody wants to bank on the unlikely event that the UK will
change course and remain in the EU, but neither can anyone count on the
fact that we won’t. We’re stuck in an increasingly acrimonious relationship
that we can’t quite bring ourselves to end yet. If they could find an actor
with a sufficient lack of charisma to play Nigel Farage, they could turn it
into a low budget BBC Three sitcom.

Don’t get me wrong, I voted firmly to remain in this EU. But whatever we
do, I feel like we, as a nation — and by that I mean they as a government
that we, as a nation, were daft enough to elect2 — need to make
a decision and act on it. This wasteland of uncertainty is worse than either
option, and doesn’t benefit anyone except
the lawyers and the journalists — frankly they can both find more
worthwhile ways to earn their keep.

So come on Theresa, stop messing about. Stick on a Spotify3
playlist called something like “100 Best Break Up Songs”, mutter some
consoling nonsense to yourself about how there are plenty more nation
states in the sea and pick up the phone. Then we can get on with making
the best of wherever we find ourselves.

Although they’re asked by Michael Gove so I dont know if they count — given his behaviour during the Tory leadership election I’m not sure he’s been allowed off the naughty step yet. ↩

In the interests of balance I should point out that, in my opinion, more or less every party this country elected since 1950 has been a daft decision. Probably before that, too, but my history gets a little too rusty to be certain. The main problem is that the people elected have an unpleasant tendency to be politicians, and if there’s one group of people to whom the business of politics should never be entrusted, it’s politicians. ↩

My website now looks hopefully very slightly less terrible on mobile devices, and I learned a few things getting there.

This website is more or less wholly my own work. I use a
content generator to create the HTML from
Markdown source files, but the credit (or blame) for the
layout and design lies with me alone1.

Actually, I should hastily add that the one exception to this is that
the current colour scheme comes straight from Ethan Schoover’s excellent
Solarized palette. But that’s not really germane to the
topic of this post, which I note with dismay I haven’t yet broached despite
this already being the end of the second paragraph.

Now I’m not going to try and claim that this site is brilliantly designed —
I’m a software engineer not a graphic designer. But I do strongly believe that
developers should strive to be generalists where practical
and this site is my opportunity to tinker. Most recently my tinkering has
been to try to improve the experience on mobile devices, and I thought I’d
share my meagre discoveries for anyone else who’s looking for a some basic
discussion on the topic to get them started.

The first thing I’d like to make clear is that this stuff is surprisingly simple.
I’ve known for awhile that the experience of reading this site on my iPhone
was not a hugely pleasant one compared to the desktop, but I’d put off
dealing with it on the assumption that fixing it would require all sorts of
hacky Javascript or browser-specific hacks — it turns out that it doesn’t, really.

However, there are some gotchas and quirks due to the
way that font rendering works and other odd issues.
Without further ado let’s jump head-first into them.

Any viewport in a storm

No matter how simple your site layout or styling, you might find that it
doesn’t render optimally on mobile devices. One reason for this is that mobile
browsers render to a much larger resolution than the actual screen, and then
they zoom font sizes up to remain legible on the screen.

The reason they do this is because they’ve grown up having to deal with all
sorts of awful web designers to whom it never occurred that someone might
want to view their page at anything less than a maximised web browser on
a 1600x1200 display. As a result they use all sorts of absolute positioning
and pixel widths, and these cope horribly when rendered on a low resolution
screen. Thus the browsers pretend that their screen is high resolution and
render accordingly — but to keep the text readable they have to boost the
font sizes. The net result is typically something that’s not necessarily all
that pretty, but is often surprisingly usable.

This is quite annoying if you’ve gone to the trouble of making sure your
site renders correctly at lower resolutions, however, and that’s pretty much
the topic of this post. So how do you go about disabling this behaviour?

The short answer is to include a tag like this:

<metaname="viewport"content="initial-scale=1.0"/>

The slightly longer answer is that the viewport meta tag
specifies the viewport area to which you expect the browser to render.
If your layout assumes a certain minimal resolution
for desirable results then you can specify that width in pixels like this:

<metaname="viewport"content="width=500"/>

This instructs the browser to render to a viewport of width 500 pixels and
then perform any scaling required to fill the screen with this. It’s also
possible to specify a height attribute but most web layouts don’t make
too many assumptions about viewport height.

The first line I showed above didn’t set width, however, but instead the
initial-scale. As you might expect, this instructs the browser to render
the the full width of the page on the pixel width of the window without
applying any scaling. Note that this is probably the only value you’ll need
if you’re confident you’ve designed your styles to adapt well to all device screens.

You can also use this tag to constrain user scaling to prevent the user
zooming in or out, but personally I find this a rather gross affront to
accessibility so I’m not going to discuss it further — you should never
presume to know how your users want to access your site better than your
users, in my opinion.

When is a pixel not a pixel?

This is all fine and dandy, but pixels aren’t everything. If I render to
a viewport of 1000 pixels the resultant physical size is significantly
different on a desktop monitor than on a retina display iPhone.

The correct solution to this would probably be some sort of way to query the
physical size or dot pitch of the display, but that would require all web
designers to do The Right Thing™ and that’s always been a bit too much to ask.

Instead, therefore,
browser vendors have invented a layer of “virtual pixels” which is designed
to be approximately the same physical size on all devices. When you specify a
size in pixels anywhere within your CSS then these “virtual pixels” are
used — the browser will have some mapping to real physical pixels according
to the screen size and resolution, and also no doubt to some extent the whims
of the browser writers and consistency with other browsers.

The CSS specifications even have a definition of a reference pixel which
links pixel measurements to physical distances — broadly speaking it says
that a pixel means a pixel on a 96 DPI standard monitor. I wouldn’t want to
make any assumptions about this, however — I’m guessing these things still
vary from browser to browser and it’ll be some time, if ever, before everyone
converges on the same standard.

As a result of all this you
might find your page being stretched around despite your best efforts — there
doesn’t seem to be a good way around this except have a fluid layout and use
high resolution images in case they’re ever stretched (few things look more
shoddy than a pixelated image).

Responsive design

Now our site renders at the correct size and scale on the device. Great. If
your layout is anything like mine, however, now you’re feeling the loss of all
those grotty scaling hacks and things don’t look all that good at all. What
you need to do is restyle your site so it degrades gracefully at smaller
screen sizes.

As much as possible I really recommend doing this through fluid layouts — the
browsers are very good at laying out text and inline elements in an appropriate
manner for a given width so where possible just use that. However, there are
things that aren’t possible with this approach — for example, if you’re using
a mobile device to view this site, or you shrink your browser window enough,
you’ll see that the side bar disappears and jumps to the top to free up some
additional width to stop the text getting too squashed. There’s no way that a
browser will be able to make this sort of decision automatically, it needs a
little help from the designer.

These offer a way to conditionally include bits of CSS based on your current
media type2 and resolution. They don’t make it any easier
to design a good mobile layout — making pretty websites is left as an exercise
for the reader — but they do at least give you a simple way to swap between
the styles you’ve created for different viewports.

There are two ways to use them — you can put conditional blocks in your CSS:

@mediascreenand(max-width:450px){body{font-size:0.9em}}

Or alternatively you can switch to an entirely different stylesheet in your
HTML:

I use both of these approaches on my site — I find it easier to write a wholly
different stylesheet for the mobile layout since it has so many small changes3,
and then I make a few small tweaks within the CSS itself for more fine-grained
viewport widths within that range.

For example, switching to the mobile stylesheet on my site
converts the sidebar sections to inline-block and renders them above the
content. Within that same stylesheet, however, there are some small tweaks to
make these render centered and spaced out where there’s room to do so,
but left-aligned for narrow displays where they’re likely to flow on to
multiple lines.

However, it’s important to note that you could use either approach
equally well on its own — I don’t believe there’s anything that can be achieved
with one but not the other.

As an aside if you’re tempted to go for the <link> approach to save network
traffic on the assumption that the browser will only fetch the sheets it needs,
think again. The browser can’t really have any special foresight about how
you’re going to resize the viewport so it fetches all CSS resources. The media
queries then just control which rules get applied.

You can find a few more details in
this Stack Overflow question. It does turn out
that some browsers will avoid downloading
image files they don’t need, but that would presumably apply regardless of
whether the media query was located in a <link> tag or in a CSS file.

Font scaling

One additional issue to be aware of when designing mobile layouts is that if you
render to the device width you can get odd results when you turn a mobile device
into landscape orientation. Intuitively I’d expect a wider, shorter viewport
with text rendered to the same font size, but in actual fact what you can end
up with is a zoomed page instead. I think what’s going on here is that the
scale factor is determined for portrait orientation and then applied in both
cases, but I must admit I’m not confident I fully appreciate the details.

Whatever the cause, one fix appears to be to disable the mobile browser font
size adjustments — this should no longer be required, after all, because the
layout is now designed to work equally well in any viewport.

Because this is a somewhat new property and not necessarily supported in its
standard form by all browsers it’s wise to specify all the browser-specific
versions as well:

You might have better luck playing around with different values of this setting
than I did — I must confess I didn’t experiment much once I realised that
disabling the adjustment seemed to fix my issues.

Hard copy

The discussion so far has been all about screens, primarily — the same media
selectors can also be used to specify an alternative stylesheet for printing.

This all works in exactly the same way except the sorts of things you do in
these stylesheets is likely quite different. For example, I force all my
colours to greyscale4 and remove the sidebar and other
navigation elements entirely. I also remove decorations around links since
these of course have no significance in a printed document.

If you want to remove something from the flow of the document you can do
this in CSS by setting display to none:

@mediaprint{div.sidebar{display:none;}}

You might wonder how this differs from setting visibility: hidden — the
difference is that display: none removes the element from the document
completely, changing the layout; whereas visibility: hidden still reserves
space for the element in the layout, it just doesn’t actually render it.

If you want to test out the print stylesheet of this or any site without
wasting paper, you can do so with Google Chrome Developer Tools5.
Open up the developer tools with the More Tools » Developer Tools menu
option and then click the vertical ellipsis (⋮) and select
More Tools » Rendering Settings to show a new window. Now you can tick
the Emulate Media option and choose Print from the dropdown.

Layout in CSS

One important thing to note about all the techniques discussed on this page is
that they only allow you to swap in or out CSS in response to viewport
width — the HTML document structure itself is unchanged. This isn’t generally
much of a limitation in today’s browsers since modern CSS offers a huge degree
of flexibility, but it’s certainly something to bear in mind when you’re
writing your HTML. In general the closer you are to a clean structural HTML
document where CSS supplies all of the layout controls, the easier you’re
likely to find adapting your site for multiple devices.

If you really need to redirect users to a different page based on width then of
course it’s possible with Javascript, but this is a
pretty ugly solution — it’s the sort of thing that leads to a world of pain
where you have a version of your site for every device under the sun. If you’re
the sort of masochist to whom that appeals, go right ahead.

Responsive vs. adaptive design

One final point that I should mention is that there are two schools of thought
about laying things out for multiple devices — these are responsive and
adaptive design, although it’s important to note that they’re not actually
mutually exclusive.

Responsive design

This describes layouts that change continuously with the width of the
viewport as it changes.

Adaptive design

This describes layouts that have several distinct options and “snap”
between them as the viewport changes.

Responsive layouts are generally regarded as more graceful, I think, but
adaptive layouts may be easier for more complex sites where it’s easier
to implement a test a small number of distinct options. Personally I’ve
used aspects of both, but I think I’d be comfortable describing my design
as responsive overall since I try to use the full viewport width where I
can, except in very wide cases where
overlong text harms readability.

This is probably better illustrated than explained so I suggest checking
out this set of GIFs that demonstrate
the differences.

Conclusions

Overall I found the whole process of making my site more mobile-friendly
very painless. I have quite a simple layout (quite intentionally) which made
it a lot less hassle than I can imagine a lot of image-heavy sites might
find. Frankly, though, that’s the modern web for you — bitmap images are so
passé, and for good reason.

Anyone who’s been curious enough to poke around in the HTML or CSS (as if anyone would do that…) might notice some references to Sphinx in the naming. This isn’t because I’ve pilfered anything from the Python documentation generator, but simply that this website theme started life as an ill-fated attempt to re-style Sphinx generated documentation — I soon realised, however, it was significantly deficient in every aspect to others such as the Read The Docs style and gave up on that idea and used it solely for this site. ↩

Where media type is essentially either screen or print in the vast majority of cases. ↩

I do factor out some commonality on the backend with sass, however, so arguably I’d save a little network bandwidth by putting those into a common CSS file and using only the CSS-style media queries. However, I feel that such minute fiddling would be somehwat against the spirit of the title of this blog. ↩

This might be a little irksome to someone with a colour printer but due to the way I’ve factored out my palette choices from the rest of the CSS it makes life a good deal easier for me. For example, if I change my colour scheme to light-on-dark then there’s no guarantee that the text colours will render legibly in the dark-on-light world of hard copy, whereas greyscale should always be consistently readable on any printer. ↩

Other browsers are available — some of them may even have equivalent functionality. As well as a browser, Google are good enough to provide you a search engine to find out just as easily as I could. ↩

I recently spotted that Python 3.5 has added yet more features to make coroutines more straightforward to implement and use. Since I’m well behind the curve I thought I’d bring myself back up to date over a series of blog posts, each going over some functionality added in successive Python versions — this one covers additional syntax that was added in Python 3.5.

In the previous post in this series I went over an example of
using coroutines to handle IO with asyncio and how it compared with the same
example implemented using callbacks. This almost brings us up to date with
coroutines in Python but there’s one more change yet to discuss — Python 3.5
contains some new keywords to make defining and using coroutines more convenient.

As usual for a Python release, 3.5 contains
quite a few changes but probably the biggest, and certainly
the most relevant to this article, are those proposed by PEP-492.
These changes aim to raise coroutines from something that’s supported by
libraries to the status of a core language feature supported by proper
reserved keywords.

Sounds great — let’s run through the new features.

Declaring and awaiting coroutines

To declare a coroutine the syntax is the same as a normal function but where
async def is used instead of the def keyword. This serves
approximately the same function as the
@asyncio.coroutine decorator did previously —
indeed, I believe one purpose of the decorator, aside from documentation purposes,
was to allow async def routines to be called. Since coroutines are now a language
mechanism and shouldn’t be intrinsically tied to a specific library, there’s
now also a new decorator @types.coroutine that can be
used for this purpose.

Previously coroutines were essentially a special case of generators — it’s
important to note that this is no longer the case, they are a wholly separate
language construct. They do still use the generator mechanisms under the hood,
but my understanding is that’s primarily an implementation detail with which
programmers shouldn’t need to concern themselves most of the time.

The distinction between a function and a generator is
whether the yield keyword appears in its body, but the distinction between a
function and a coroutine is whether it’s delcared with async def. If you try
to use yield in a coroutine declared with async def you’ll get
SyntaxError (i.e. a routine cannot be both a generator
and a coroutine).

So far so simple, but coroutines aren’t particularly useful until they can
yield control to other code — that’s more or less the whole point. With
generator-based coroutines this was achieved with yield from and with new
syntax it’s achieved with the await keyword. This can be used to wait for
the result from any object which is awaitable

Objects defined in C/C++ extensions with a tp_as_async.am_await method —
this is more or less equivalent to __await__() in pure Python objects.

The last option is perhaps simpler than it sounds — any object wishes to be
awaitable needs to return an interator from its __await__() method. This
iterator is used to implement the funamental wait operation — the iterator’s
__next__() method is invoked and the value it yields
is used as the value of the await expression.

It’s important to note that this definition of awaitable is what’s required of
the argument to await, but the same conditions don’t apply to yield from.
There are some things that both will accept (i.e. coroutines) but await
won’t accept generic generators and yield from won’t accept the other forms
of awaitable (e.g. an object with __await__()).

It’s also equally important to note that a coroutine defined with async def
can’t every directly return control to the event loop — there simply isn’t the
machinery to do so. Typically this isn’t much of a problem since most of the
time you’ll be using asyncio functions to do this, such as asyncio.sleep()
— however, if you wanted to implement something like asyncio.sleep() yourself
then as far as I can tell you could only do so with generator-based coroutines.

OK, so let me be pedantic and contradict myself for a moment — you can indeed
implement something like asyncio.sleep() yourself. Indeed, here’s a simple implementation:

This has a lot of deficiencies as it doesn’t handle being cancelled or other
corner cases, but you get the idea. However the key point here is that this
depends on asyncio.Future and if you go look at the
implementation for that then you’ll see that __await__()
is just an alias for __iter__() and that method uses yield to return control
to the event loop. As I said earlier, it’s all built on generators under the
hood, and since yield isn’t permitted in an async def coroutine, there’s no
way to achieve that (at least as far as I can tell).

In general, however, the amount of times you would be returning control to the
event loop is very low — the vast majority of cases where you’re likely to do
that are for a fixed delay or for IO and asyncio already has you covered in
both cases.

Coroutines example

As a quick example of await in action consider the script below which is
used to ping several hosts in parallel to determine whether
they’re alive. This example is quite contrived, but it illustrates the new
syntax — it’s also an example of how to use the asyncio subprocess support.

importasyncioimportosimportsysPING_PATH="/sbin/ping"asyncdefping(server,results):withopen(os.devnull,"w")asfd:# -c1 -> perform a single ping request only# -t3 -> timeout of three seconds on response# -q -> generate less outputproc=awaitasyncio.create_subprocess_exec(PING_PATH,'-c1','-q','-t3',server,stdout=fd)# Wait for the ping process to exit and check exit codereturncode=awaitproc.wait()results[server]=notbool(returncode)asyncdefprogress_ticker(results,num_servers):whilelen(results)<num_servers:waiting=num_servers-len(results)msg="Waiting for {0} response(s)".format(waiting)sys.stderr.write(msg)sys.stderr.flush()awaitasyncio.sleep(0.5)sys.stderr.write("\r"+" "*len(msg)+"\r")defmain(argv):results={}tasks=[ping(server,results)forserverinargv[1:]]tasks.append(progress_ticker(results,len(tasks)))loop=asyncio.get_event_loop()loop.run_until_complete(asyncio.wait(tasks))loop.close()forserver,pingableinsorted(results.items()):status="alive"ifpingableelse"dead"print("{0} is {1}".format(server,status))if__name__=="__main__":sys.exit(main(sys.argv))

One point that’s worth noting is that since we’re using coroutines as opposed
to threads to achieve concurrency within the script1,
we can safely access the results dictionary without any form of locking and
be confident that only one coroutine will be accessing it at any one time.

Asynchronous context manager and iterators

As well as the simple await demonstrated above there’s also a new syntax for
allowing context managers to be used in coroutines.

The issue with a standard context manager is that the __enter__() and
__exit__() methods could take some time or perform blocking operations -
how then can a coroutine use them whilst still yielding to the event loop
during these operations?

The answer is support for
asynchronous context managers. These work
in a vary similar manner but provide two new methods __aenter__() and
__aexit__() — these are called instead of the regular versions when the
caller invokes async with instead of the plain with
statement. In both cases they are expected to return an awaitable object that
does the actual work.

These are a natural extension to the syntax already described and allow
coroutines to make use of any constructions which may perform blocking
IO in their enter/exit routines — this could be database connections,
distributed locks, socket connections, etc.

Another natural extension are
asynchronous iterators. In this case objects
that wish to be iterable implement an __aiter__() method which returns
an asynchronous iterator which
implements an __anext__() method. These two are directly analogous
to __iter__() and __next__() for standard iterators, the difference
being that __anext__() must return an awaitable object to obtain the
value instead of the value directly.

Note that in Python 3.5.x prior to 3.5.2 the __aiter__() method was
also expected to return an awaitable, but this changed in 3.5.2 so that
it should return the iterator object directly. This makes it a little fiddly
to write compatible code because earlier versions still expect an awaitable,
but I strongly recommend writing code which caters for the later versions —
the Python documentation has a workaround if necessary.

To wrap up this section let’s see an example of async for — with apologies
in advance to anyone who cares even the slightest bit about the correctness
of HTTP implementations I present a HTTP version of the cat utility.

This is a heavily over-simplified example with many shortcomings (e.g. it
doesn’t even support redirections or chunked encoding) but it shows how the
__aiter__() and __anext__() methods can be used to wrap up operations
which may block for significant periods.

One nice property of this construction is that lines of output will flow down
as soon as they arrive from the socket — many HTTP clients seem to want to
block until the whole document is retrieved and return it as a string. This
is terribly inconvenient if you’re fetching a file of many GB.

Coroutines make streaming the document back in chunks a much more natural
affair, however, and I really like the ease of use for the client. Of course,
in reality you’d use a library like aiohttp to avoid
messing around with HTTP yourself.

Conclusions

That’s the end of this sequence of articles and we’re brought about bang
up to date. Overall I really like the fact that the Python developers have
focused on making coroutines a proper first-class concept within the
language. The implementation is somewhat different to other
languages, which often seem to try to hide the coroutines themselves and
offer only futures as the language interface, but I do like knowing
when my context switches are constrained to be — especially if I’m relying
on this mechanism to avoid locking that would otherwise be required.

The syntax is nice and the paradigm is pleasant to work with — but are
there any downsides? Well, because the implementation is based on generators
under the hood I do have my concerns around raw performance. One of the
benefits of asynchonrous IO should really be the performance boost and
scalability vs. threads for dominantly IO-bound applications — while the
scalability is probably there, I’m a little unconvinced about the performance
for real-world cases.

I hunted around for some proper benchmarks but they see few and far
between. There’s this page which has a useful collection
of links, although it hasn’t been updated for almost a year — I guess things
are unlikely to have moved on significantly in that time. From looking over
these results it’s clear that asyncio and aiohttp aren’t the cutting edge
of performance, but then again they’re not terrible either.

When all’s said and done, if performance is the all-consuming overriding
concern then you’re unlikely to be using Python anyway. If it’s important
enough to warrant an impact on readability then you might want to at least
investigate threads or gevent before making a decision. But if
you’ve got what I would regard as a pretty typical set of concerns, where your
readablity and maintainability are the top priority, even though
you don’t want performance to suffer too much, then take a serious look
at coroutines — with a bit of practice I think you might learn to love them.

Or maybe at least dislike them less than the other options.

I’m ignoring the fact that we’re also using subprocesses for concurrency in this example since it’s just an implementation detail of this particular case and not relevant to the point of safe access to data structures within the script. ↩

I recently spotted that Python 3.5 has added yet more features to make coroutines more straightforward to implement and use. Since I’m well behind the curve I thought I’d bring myself back up to date over a series of blog posts, each going over some functionality added in successive Python versions — this one covers more of the asyncio module that was added in Python 3.4.

In the preceding post in this series I introduced the asyncio
module and its utility as an event loop for coroutines. However, this isn’t
the only use of the module — its primary purpose is to act as an event loop
for various forms of I/O such as network sockets and pipes to child processes.
In this post, then, I’d like to compare the two main approaches to doing this:
using callbacks and using coroutines.

A brief digression: handling multiple connections

Anyone that’s done a decent amount of non-blocking I/O can probably skim or
skip this section — for anyone who’s not come across this problem in their
coding experience, this might be useful.

There are quite a few occasions where you end up needing to handle multiple
I/O streams simultaneously. An obvious one is something like a webserver,
where you want to handle multiple network connections concurrently. There are
other examples, though — one thing that crops up quite often for me is
managing multiple child processes, where I want to stream output from them as
soon as it’s generated. Another possibility is where you’re making multiple
HTTP requests that you want to be fetched in parallel.

In all these cases you want your application to respond immediately to input
received on any stream, but at the same time it’s clear you need to block
and wait for input — endlessly looping and polling each stream would be
a massive waste of system resources. Typically there are two main approaches
to this: threads and non-blocking I/O1.

These days threads seem to be the more popular solution — each I/O stream has
a new thread allocated for it and the stack of this thread encapsulates its
complete state. This makes it easy for programmers who aren’t used to dealing
with event loops — they can continue to write simple sequential code that
uses standard blocking I/O calls to yield as required. It has some downsides,
however — cooperating with other threads requires the overhead of
synchronisation and if the turnover of connections is high (consider, say, a
busy DNS server) then it’s slightly wasteful to be continually creating and
destroying thread stacks. If you want to solve the C10k problem,
for example, I think you’d struggle to do it using a thread per connection.

The other alternative is to use a single thread and have it wait for activity
on any stream, then process that input and go back to sleep again until another
stream is ready. This is typically simpler in some ways — for example, you
don’t need any locking between connections because you’re only processing one
at any given time. It’s also perfectly performant in cases where you expect
to be primarily IO-bound (i.e. handling connections won’t require significant
CPU time) — indeed, depending on how the data structures associated with
your connections are allocated this approach could improve performance by
avoiding false sharing issues.

The downside to this method is that it’s a rather less intuitive for many
programmers. In general you’d like to write some straight-line code to handle
a single connection, then have some magical means to extend that to multiple
connections in parallel — that’s the lure of threading. But there is a way
we can achieve, to some extent, the best of both worlds (spoiler alert: it’s coroutines).

The mainstays for implementing non-blocking I/O loops in the Unix world have
long been select(), introduced by BSD in 1983, and the
slightly later poll(), added to System V in 1986. There are
some minor differences but in both cases the model is very similar:

Register a list of file descriptors to watch for activity.

Call the function to wait for activity on any of them.

Examine the returned value to discover which descriptors are active and process them.

Loop around to the beginning and wait again.

This is often known as the event loop — it’s a loop, it handles events.
Implementing an event loop is quite straightforward, but the downside is that
the programmer essentially has to find their own way to maintain the state
associated with each connection. This often isn’t too tricky, but sometimes
when the connection handling is very context-dependent it can make the code
rather hard to follow. If often feels like scrabbling to implement some
half-arsed version of closures and it would preferable
to let language designers worry about that sort of thing.

The rest of this article will focus on how we can use asyncio to stop
worrying so much about some of these details and write more natural code
whilst still getting the benefits of the non-blocking I/O approach.

asyncio with callbacks

One problem with using the likes of select() is that it can encourage you to
drive all your coding from one big loop. Without a bit of work, this tends to
run counter to the design principle of separating concerns,
so we’d like to move as much as possible out of this big loop. Ideally we’d
also like to abstract it, implement in a library somewhere and get benefits of
reusing well-tested code. This is partiularly important for event loops where
the potential for serious issues (such as getting into a busy loop) is rather
higher than in a lot of areas of code.

The most common way to hook into a generic event look is with callbacks.
The application registers callback functions which are to be invoked when
particular events occur, and then the application jumps into a wait function
whose purpose is to simply loop until there are events and invoke the
appropriate callbacks.

It’s unsurprising, then, that asyncio is designed to support the callback
approach. To illustrate this I’ve turned to my usual example of a chat server
— this is a really simple daemon that waits for socket connections (e.g. using
netcat or telnet) then prompts for a
username and allows connected users to talk to each other.

This implementation is, of course, exceedingly basic — it’s meant to be an
example, not a fully-featured application. Here’s the code, I’ll touch on the
highlights afterwards.

importasyncioimportsysclassChatServer:classChatProtocol(asyncio.Protocol):def__init__(self,chat_server):self.chat_server=chat_serverself.username=Noneself.buffer=""self.transport=Nonedefconnection_made(self,transport):# Callback: when connection is established, pass in transport.self.transport=transportwelcome="Welcome to "+self.chat_server.server_nameself.send_msg(welcome+"\nUsername: ")defdata_received(self,data):# Callback: whenever data is received - not necessarily buffered.data=data.decode("utf-8")self.buffer+=dataself.handle_lines()defconnection_lost(self,exc):# Callback: client disconnected.ifself.usernameisnotNone:self.chat_server.remove_user(self.username)defsend_msg(self,msg):self.transport.write(msg.encode("utf-8"))defhandle_lines(self):while"\n"inself.buffer:line,self.buffer=self.buffer.split("\n",1)ifself.usernameisNone:ifself.chat_server.add_user(line,self.transport):self.username=lineelse:self.send_msg("Sorry, that name is taken\nUsername: ")else:self.chat_server.user_message(self.username,line)def__init__(self,server_name,port,loop):self.server_name=server_nameself.connections={}self.server=loop.create_server(lambda:self.ChatProtocol(self),host="",port=port)defbroadcast(self,message):fortransportinself.connections.values():transport.write((message+"\n").encode("utf-8"))defadd_user(self,username,transport):ifusernameinself.connections:returnFalseself.connections[username]=transportself.broadcast("User "+username+" joined the room")returnTruedefremove_user(self,username):delself.connections[username]self.broadcast("User "+username+" left the room")defget_users(self):returnself.connections.keys()defuser_message(self,username,msg):self.broadcast(username+": "+msg)defmain(argv):loop=asyncio.get_event_loop()chat_server=ChatServer("Test Server",4455,loop)loop.run_until_complete(chat_server.server)try:loop.run_forever()finally:loop.close()if__name__=="__main__":sys.exit(main(sys.argv))

The ChatServer class provides the main functionality of the application,
tracking the users that are connected and providing methods to send messages.
The interaction with asycio, however, is provided by the nested
ChatProtocol class. To explain what this is doing, I’ll summarise a little terminology.

The asyncio module splits IO handling into two areas of responsibility —
transports take care of getting raw bytes from one place to another and
protocols are responsible for interpreting those bytes into some more
meaningful form. In the case of a HTTP request, for example, the transport
would read and write from the TCP socket and the protocol would marshal up
the request and parse the response to exract the headers and body.

This is something asyncio took from the Twisted networking
framework and it’s one of the aspects I really appreciate. All too many
HTTP client libraries, for example, jumble up the transport and protocol
handling into one big mess such that changing one aspect but still making use
of the rest is far too difficult.

The transports that asyncio provides cover TCP, UDP, SSL and pipes to a
subprocess, which means that most people won’t need to roll their own. The
interesting part, then, is asycio.Protocol and that’s what ChatProtocol
implements in the example above.

The first thing that happens is that the main() function instantiates the
event loop — this occurs before anything else as it’s required for all the
other operations. We then create a ChatServer instance whose constructor
calls create_server() on the event loop. This opens
a listening TCP socket on the specified port2 and takes a protocol
factory as a parameter. Every time there is a connection on the listening
socket, the factory will be used to manufacture a protocol instance to handle it.

The main loop then calls run_until_complete()
passing the server that was returned by create_server() — this will block
until the listening socket is fully open and ready to accept connections.
This probably isn’t really required because the next thing it does is then
call run_forever() which causes the event loop to
process IO endlessly until explicitly terminated.

The meat of the application is then how ChatProtocol is implemented. This
implements several callback methods which are invoked by the asyncio
framework in response to different events:

A ChatProtocol instance is constructed in response to an incoming
connection on the listening socket. No parameters are passed by asyncio —
because the protocol needs an instance to the ChatServer instance this is
passed via a closure by the lambda in the create_server() call.

Once the connection is ready, the connection_made()
method is invoked which passes the transport that asyncio has allocated
for the connection. This allows the protocol to store a reference to it
for future writes, and also to trigger any actions required on a new
connection — in this example, prompting the user for a username.

As data is received on the socket, data_received()
is invoked to pass this to the protocol. In our example we only want
line-oriented data (we don’t want to send a message to the chat room until
the user presses return) so we buffer up data in a string and then process
any complete lines found in it. Note that we also should take care of
character encoding here — in our simplistic example we blindly assume
UTF-8.

When we want to send data back to the user we invoke the
write() method of the transport. Again, the
transport expects raw bytes so we handle encoding to UTF-8 ourselves.

Finally, when the user terminates their connection then our
connection_lost() method is invoked — in our
example we use this to remove the user from the chatroom. Note that this is
subtly different to the eof_received() callback which
represents TCP half-close (i.e. the remote end called
shutdown() with SHUT_WR) — this is important if you want
to support protocols that indicate the end of a request in this manner.

That’s about all there is to it — with this in mind, the rest of example
should be quite straightforward to follow. The only other aspect to mention is
that once the loop has been terminated, we go ahead and call its
close() method — this clears out any queued data, closes
listening sockets, etc.

asyncio with coroutines

Since we’ve seen how to implement the chat server with callbacks, I think it’s
high time we got back to the theme of this post and now compare that with an
implementation of the same server with coroutines. In usual fashion, let’s jump
in and look at the code first:

importasyncioimportsysclassChatServer:def__init__(self,server_name,port,loop):self.server_name=server_nameself.connections={}self.server=loop.run_until_complete(asyncio.start_server(self.accept_connection,"",port,loop=loop))defbroadcast(self,message):forreader,writerinself.connections.values():writer.write((message+"\n").encode("utf-8"))@asyncio.coroutinedefprompt_username(self,reader,writer):whileTrue:writer.write("Enter username: ".encode("utf-8"))data=(yield fromreader.readline()).decode("utf-8")ifnotdata:returnNoneusername=data.strip()ifusernameandusernamenotinself.connections:self.connections[username]=(reader,writer)returnusernamewriter.write("Sorry, that username is taken.\n".encode("utf-8"))@asyncio.coroutinedefhandle_connection(self,username,reader):whileTrue:data=(yield fromreader.readline()).decode("utf-8")ifnotdata:delself.connections[username]returnNoneself.broadcast(username+": "+data.strip())@asyncio.coroutinedefaccept_connection(self,reader,writer):writer.write(("Welcome to "+self.server_name+"\n").encode("utf-8"))username=(yield fromself.prompt_username(reader,writer))ifusernameisnotNone:self.broadcast("User %r has joined the room"%(username,))yield fromself.handle_connection(username,reader)self.broadcast("User %r has left the room"%(username,))yield fromwriter.drain()defmain(argv):loop=asyncio.get_event_loop()server=ChatServer("Test Server",4455,loop)try:loop.run_forever()finally:loop.close()if__name__=="__main__":sys.exit(main(sys.argv))

As you can see, this version is written in quite a different style to the
callback variant. This is because it’s using the streams API
which is essentially a set of wrappers around the callbacks version which
adapts them for use with a coroutines.

To use this API we call start_server() instead of
create_server() — this wrapper changes the way the supplied callback is
invoked and instead passes it two streams: StreamReader
and StreamWriter instances. These represent the input
and output sides of the socket, but importantly they’re also coroutines so
that we can delegate to them with yield from.

On the subject of coroutines, you’ll notice that some of the methods have an
@asyncio.coroutine decorator — this serves a practical
function in Python 3.5 in that it enables you to delegate to the new style
of coroutine that it defines. Pre-3.5 it’s therefore useful for future
compatibility, but also serves as documentation that this method is being
treated as a coroutine. You should always use it to decorate your coroutines,
but this isn’t enforced anywhere.

Back to the code. Our accept_connection() method is the callback that we
provided to the start_server() method and the lifetime of this method call
is the same as the lifetime of the connection. We could implement the handling
of a connection in a strictly linear fashion within this method — such is the
flexibility of coroutines — but of course being good little software engineers
we like to break things out into smaller functions.

In this case I’ve chosen to use a separate coroutine to handle prompting the
user for their username, so accept_connection() delegates to
prompt_username() with this line:

username=(yield fromself.prompt_username(reader,writer))

Once delegated, this coroutine takes control for as long as it takes to obtain
a unique username and then returns this value to the caller. It also handles
storing the username and the writer in the connections member of the class —
this is used by the broadcast() method to send messages to all users in the room.

The handle_connection() method is also implemented in quite a straightforward
fashion, reading input and broadcasting it until it detects that the connection
has been closed by an empty read. At this point it removes the user from the
connections dictionary and returns control to accept_connection(). We
finally call writer.drain() to send any last buffered output — this is
rather pointless if the user’s connection was cut, but could still serve a
purpose if they only half-closed or if the server is shutting down instead.
After this we simply return and everything is cleaned up for us.

How does this version compare, then? It’s a little shorter for one thing — OK,
that’s a little facile, what else? We’ve managed to lose the nested class,
which seems to simplify the job somewhat — there’s less confusion about the
division of responsibilities. We also don’t need to worry so much about where
we store things — there’s no transport that we have to squirrel away somewhere
while we wait for further callbacks. The reader and writer streams as just
passed naturally through the callchain in an intuitive manner. Finally, we
don’t have to engage in any messy buffering of data to obtain line-oriented
input — the reader stream handles all that for us.

Conclusions

That about wraps it up for this post. Hopefully it’s been an interesting
comparison — I know that I certainly feel like I understand the various
layers of asyncio a little better having gone through this exercise.

It takes a bit of a shift in one’s thinking to use coroutine approach, and I
think it’s helpful to have a bit of a handle on both mechanisms to better
understand what’s going on under the hood, but overall the more I use the
coroutine style for IO the more I like it. It feels like a good compromise
between the intuitive straight-line approach of the thread-per-connection
approach and the lock-free simplicity of non-blocking IO with callbacks.

In the next post I’m going to look at the new syntax for coroutines introduced
in Python 3.5, which was the inspiration for writing this series of posts in
the first place.

Some people use the term asynchronous IO for what I’m discussing here, which is certainly the more general term, but I prefer to avoid it due to risk of confusion with the POSIX asynchronous IO interface. ↩

I recently spotted that Python 3.5 has added yet more features to make coroutines more straightforward to implement and use. Since I’m well behind the curve I thought I’d bring myself back up to date over a series of blog posts, each going over some functionality added in successive Python versions — this one covers parts of the asyncio module that was added in Python 3.4.

In the previous post I discussed the state of coroutines in
Python 2.x and then the yield from enhancement added in Python 3.3. Since
that release there’s been a succession of improvements for coroutines and in
this post I’m going to discuss those that were added as part of the
asyncio module.

It’s a pretty large module and covers quite a wide variety of functionality,
so covering all that with in-depth discussion and examples is outside the
scope of this series of articles. I’ll try to touch on the finer points,
however — in this article I’ll discuss the elements that are relevant
to coroutines directly and then in the following post I’ll talk about
the IO aspects.

History of asyncio

Python 2 programmers may recall the venerable asyncore
module, which was added way back in the prehistory of Python 1.5.2. Its
purpose was to assist in writing endpoints that handle IO from sources
such as sockets asynchronously. To create clients you derive your own class
from asyncore.dispatcher and override methods to handle events.

This was a helpful module for basic use-cases but it wasn’t particularly
flexible if what you wanted didn’t quite match its structure. Generally
I found I just ended up rolling my own polling loop based on things from
the select module as I needed them (although if I were using
Python 3.4 or above then I’d prefer the selectors module).

If you’re wondering why talk of an old asynchronous IO module is relevant
to a series on coroutines, bear with me.

The limitations of asyncore were well understood and several third party
libraries sprang up as alternatives, one of the most popular being
Twisted. However, it was always a little annoying that such a
common use-case wasn’t well catered for within the standard library.

Back in 2011 PEP 3153 was created to address this deficiency.
It didn’t really have a concrete proposal, however, it just defined the
requirements — Guido addressed this in 2012 with PEP 3156 and
the fledgling asyncio library was born.

The library went through some iterations under the codename Tulip and a
couple of years later it was included in the standard library of Python 3.4.
This was on a provisional basis — this means that it’s there, it’s not going
away, but the core developers reserve the right to make incompatible changes
prior to it being finalised.

OK, still not seeing the link with coroutines? Well, as well as handling IO
asynchronously, asyncio also has a handy event loop for scheduling
coroutines. This is because the entire library is designed for use in two
different ways depending on your preferences — either a more traditional
callback-based scheme, where callbacks are invoked on events; or with a set
of coroutines which can each block until there’s IO activity for them to
process. Even if you don’t need to do IO, the coroutine scheduler is a useful
piece that you don’t need to build yourself.

asyncio as a scheduler

At this point it would be helpful to consider a quick example of what asyncio
can do on the scheduling front without worrying too much about IO — we’ll
cover that in the next post.

In the example below, therefore, I’ve implemented something like
logrotate — mine is extremely simple1 and doesn’t run off a
configuration file, of course, because it’s just for demonstration purposes.

First here’s the code — see if you can work out what it does, then I’ll
explain the finer points below.

Each file rotation policy that I’ve implemented is its own coroutine. Each one
operates independently of the others and the underlying rotate_file()
function is just to refactor out the common task of actually rotating the files.
In this case they all delegate their waiting to the asyncio.sleep() function
as a convenience, but it would be equally possible to write a coroutine which
does something more clever, like hook into inotify,
for example.

Under the hood the @asyncio.coroutine decorator
marks the function as a coroutine such that
asyncio.iscoroutinefunction() returns True —
this may be required for disambiguation in parts of asyncio where the
code needs to handle coroutines differently from regular callback functions.
The create_task() call then wraps the coroutine instance
in a Task class — Task is a subclass of
Future and this is where the coroutine and callback worlds meet.

An asyncio.Future represents the future result of an asynchronous
process. Completion callbacks can be registered with it using the
add_done_callback(). When the
asynchronous result is ready then it’s passed to the Future with the
set_result() method — at this point any registered
completion callbacks are invoked. It’s easy to see, then, how the
Task class is a simple wrapper which waits for the result of its
wrapped coroutine to be ready and passes it to the parent Future class
for invocation of callbacks. In this way, the coroutine and callback
worlds can coexist quite happily — in fact in many ways the coroutine
interface is a layer implemented on top of the callbacks. It’s a pretty
crucial layer in making the whole thing cleaner and more manageable
for the programmer, however.

The part that links it all together is the event loop, which asyncio just
gives you for free. There are a few details I’ve glossed over, however,
since it’s not too important for a basic understanding. One thing to be aware
of is that there are currently two event loop implementations — most
people will be using SelectorEventLoop, but
on Windows there’s also the ProactorEventLoop
which uses different underlying primitives and has different tradeoffs.

This scheduling may all seem simplistic, and it’s true that in this
example asyncio isn’t
doing anything hugely difficult. But building your own event loop isn’t
quite as trivial as it sounds — there are quite a few gotchas that can trip
you up and leave your code locked up or sleeping forever. This is particularly
acute when you introduce IO into the equation, where there are some slightly
surprising edge cases that people often miss such as handling sockets which
have performed a remote shutdown. Also, this approach is quite modular and
manages to produce single-threaded code where different asynchronous
operations interoperate with little or no awareness of each other. This
can also be achieved with threading, of course, but this way we don’t
need locks and we can more or less rule out issues such as race conditions
and deadlocks.

That wraps it up for this article. I’ll cover the IO aspects of ascynio in
my next post, covering and comparing both the callback and coroutine
based approaches to using it. This is particularly important because one
area where coroutines really shine (vs threads) is where your application is
primarily IO-bound and so there’s no need to explode over multiple cores.

In just one example of many issues, for extra credit2 you might like to consider whay happens to the rotate_daily() implementation when it spans a DST change. ↩

Where the only credit to which I’m referring are SmugPoints(tm): a currency that sadly only really has any traction inside the privacy of your own skull. ↩

I recently spotted that Python 3.5 has added yet more features to make coroutines more straightforward to implement and use. Since I’m well behind the curve I thought I’d bring myself back up to date over a series of blog posts, each going over some functionality added in successive Python versions — this one covers the facilities up to and including the yield from syntax added in Python 3.3.

I’ve always thought that coroutines are an underused paradigm.

Multithreading is great for easily expanding single threaded approaches to
make better use of modern hardware with minimal changes; multiprocess is great
for enforcement of interfaces and also extending across multiple machines. In
both cases, however, the premise is on performance at the expense of simplicity.

To my mind, coroutines offer the flip side of the coin —
perhaps performance isn’t critical, but your approach is just more naturally
expressed as a series of cooperative processes. You don’t want to wade through
a sea of memory barriers to implement such things, you just want to divide up
your responsibilities and let the data flow through.

In this short series of posts I’m going to explore what facilities we have
available for implementing coroutines in Python 3, and in the process catch
myself up developments in that area.

Coroutines in Python 2

Before looking at Python 3 it’s worth having a quick refresher on the options
for implementing coroutines in Python 2, not least because many programmers
will still be constrained to use this version in many commercial environments.

The genesis of coroutines was when generators were added to
the language in Python 2.2 — these are essentially lazily-evaluated lists.
One defines what looks like a
normal function but instead of a return statement yield is used. This
has the effect of emitting a value from your generator but — and this is
crucial — it also suspends execution of your generator in place and returns
the flow of execution back to the calling code. This continues until the caller
requests the next value from the generator at which point it resumes execution
just after the yield statement.

Generators are, of course, fantasically useful on their own. In terms of
coroutines, however, they’re only half the story — they can yield outputs
but they can only take their initial inputs, they can’t be updated during
their execution.

To address this Python 2.5 extended generators in several
ways which allow them to be turned into general purpose generators. A quick
summary of these enhancements is:

yield, which was previously a statement, was redefined to be an expression.

Added a send() method to inject inputs during execution.

Added a throw() method to inject exceptions.

Added a close() method to allow the caller to terminate a generator early.

There are a few other tweaks, but those are the main points. The net result
of these changes is that one could now write a generator where new values can
be injected, via the send() method, and these are returned within the
generator as the value of the yield expression.

As a simple example of this, consider the code below which implements a
coroutine that accepts a number as a parameter and returns back the average
of all the numbers up to that point.

Python 3.3 adds “yield from”

The conversion of generators to true coroutines was the final development
in this story in Python 2 and development of the language long ago moved
to Python 3. In this vein there was another advancement of coroutines
added in Python 3.3 which was the yield from construction.

This stemmed from the observation that it was quite cumbersome to refactor
generators into several smaller units. The complication is that a generator
can only yield to its immediate caller — if you want to split generators
up for reasons of code reuse and modularity, the calling generator would
have to manually iterate the sub-generator and re-yield all the results.
This is tedious and inefficient.

The solution was to add a yield from statement to delegate control entirely
to another generator. The subgenerator is run to completion, with results
being passed directly to the original caller without involvement from the
calling generator. In the case of coroutines, sent values and thrown exceptions
are also propogated directly to the currently executing subgenerator.

At its simplest this allows a more natural way to express solutions where
generators are delegated. For a really simple example, compare these two
sample1 implementations of
itertools.chain():

Right now, of course, this looks somewhat handy but a fairly minor improvement.
But when you consider general coroutines, it becomes a great mechanism for
transferring control. I think of them a bit like a state machine where each
state can have its own coroutine, so the concerns are kept separate, and where
the whole thing just flows data through only as fast as required by the caller.

I’ve illustrated this below by writing a fairly simple parser for expressions
in Polish Notation — this is just like
Reverse Polish Notation only backwards. Or perhaps I mean
forwards. Well, whichever way round it is, it
really lends itself to simple parsing because the operators precede their
arguments which keeps the state machine nice and simple. As long as the
arity of the operators is fixed, no brackets are required
for an unambiguous representation.

The main entrypoint is the parse_expression() generator. In this case it’s
necessary to have a single parent because we want the behaviour of the top-level
expressions to be fundamentally different — in this case, we want it to yield
the result of the expression, whereas intermediate values are instead consumed
internally within the set of generators and not exposed to calling code.

We use the parse_argument() generator to calculate the result of an expression
and return it — it can use a return value since it’s called as a subgenerator
of parse_expression() (and others). This determines whether each token is
an operator or numeric literal — in the latter case it just returns the
literal as a float. In the former case it delegates to a subgenerator based
on the operator type — here I just have unary and binary operators as simple
illustrative cases. Note that one could easily implement an operator of
variable arity here, however, since the delegate generator makes its own
decision of when to relinquish control back to the caller — this is an
important property when modularising code.

Hopefully this example is otherwise quite clear — the parse_expression()
generator simply loops and yields the values of all the top-level expressions
that it encounters. Note that because there’s no filtering of the results by
the calling generator (since it’s just delegating) then it will yield lots
of None results as it consumes inputs until the result of a top-level
expression can be yielded — it’ll be up to the calling code to ignore these.
This is just a consequence of the way send() on a generator always yields
a value even if there isn’t a meaningful value.

The only other slight wrinkle is that you might see some
excessive bracketing around the yield operators — this is typically a good
idea. PEP 342 describes the parsing rules, but if you just
remember to always bracket the expression then that’s one less thing to worry about.

One thing that’s worth noting is that this particular example is quite wasteful
for deeply nested expressions in the same way that recursive functions can be.
This is because it constructs two new generators for each nested expression —
one for parse_argument() and one for whichever operator-specific subgenerator
this delegates to.
Whether this is acceptable depends on your use-cases and the extent which you
want to trade off the code expressiveness against space and time complexity.

Here I’ve defined a convenience wrapper generator which accepts the expression
as a whitespace-delimited string and strips out the intermediate None values
that are yielded. If you run that you should see there’s two top-level
expressions which yield the same result.

Coming up

That wraps it up for this post — I hope it’s been a useful summary of where
things stand in terms of coroutines as far as Python 3.3. In future posts I’ll
discuss the asyncio library that was added in Python 3.4, and the additional
async keyword that was added in Python 3.5.

Neither of these are anything to do with the official Python library, of course — they’re just implementations off the top of my head. I chose itertools.chain() purely because it’s very simple. ↩