Snakes on the Web

September 04, 2009

A talk given at PyCon Argentina and PyCon Brazil, 2009.

Web development sucks.

It’s true: web development, at its worst, is difficult, repetitive, and boring.
The tools we have suck. At best, they make web development slightly less
painful, but we’re a long way from making web development awesome.

The history of web development tools is a history of trying to solve this
problem. It’s a history of asking, “how can we make this suck less?” It’s
important to understand this history, because we can look at past trends and use
them to predict the future.

A brief, opinionated history of web development

I like to think of this time as the “Stone Age” of web development, an age
characterized by clumsy, difficult tools. We wrote HTML, by hand, usually in a
text editor. The concept of dynamic web pages didn’t exist: if you wanted to
publish a thousand stories online, you wrote a thousand .html documents.

This sucked. Obviously.

And it led to an obvious question: what if we could generate HTML
programatically, like from a database or something?

And thus was CGI born, and the Bronze Age of the web began. We had tools,
finally, to automate some of our repetitive tasks. We could handle user input!
Connect web pages to databases! Flow content through templates!

But CGI sucks. It’s horribly slow, for one. Worse, it encourages incredibly bad
development style. Either you have dozens (hundreds!) of .cgi programs, with
all sorts of repetitive code… or you end up with a single, gigantic,
monolithic go.cgi. CGI usually proves horrifically hard to maintain. The
fact that most CGI of this era were written in Perl – or, worse, C – doesn’t help.

So, again, smart people started asking questions. At first, we asked, “how can
we make CGI suck less?” Note that this isn’t a very big question – it’s not
really a rethinking of how web development should work – but it still led to a
leap forward: the first generation of application servers.

I’m talking here about things like mod_perl, mod_python, and especially PHP.
PHP, which is essentially “CGI done right,” quickly become overwhelmingly
dominant, and the Iron Age of the web began2.

Now, I mentioned before that the questions that led to PHP weren’t very good
questions, and so the leap forward wasn’t all that dramatic. The Iron Age is
very similar to the Bronze Age: the tools all look mostly the same, they just
use slightly better tech. That is, PHP suffers from most of the same problems as
CGI, but to a lesser extent.

The biggest problem, at least to my mind, with the first few ages of web
development is that the mindset is essentially page-oriented. All we did in
these early years was trade page.html for page.cgi for page.php. We
still represented web sites as a collection of pages, written in a string of
improving languages.

So the real revolution came when we started to question this basic assumption.
“What if,” we asked, “we could think of these things as applications, not pages?”

This question led directly to the creation of Django, and to the Industrial
Revolution: Web Frameworks.

Now, technical revolutions happen organically. Take the printing press: though
Gutenberg is credited with its invention, in fact the press was simultaneously
discovered by at least two other inventors3 in Europe around the same
time. Indeed, it’s actually inaccurate to credit any of these men with
building the first press: press-printed material dating back to the 7th century
has been found in China and Korea4.

Like the printing press, then, frameworks existed long before the current crop
(WebObjects is just one example). Like the Industrial Revolution, the
Framework Revolution happened in many places, and in many different ways. I
don’t want for a minute to pretend that Django was the first framework – or
among the first – nor was Django born in a vacuum. Django is, like the other
frameworks of our time, a product of the age and of these questions about web development.

The Industrial Age: Web Frameworks

Now we find ourselves in the Industrial Age, the Age of the Framework. Since I’m
talking about “frameworks” quite a bit, I think it’s worth a bit of time to
clarify what I mean. As I see it, the main characteristics of a modern web
framework are:

They operate at a high conceptual level. Instead of thinking about HTTP,
HTML, and web “pages”, frameworks allow us to think and to operate at the
level of web application. This means less code, and it also allows us to
be much more ambitious about what we’re building.

Frameworks provide much larger building blocks.

I like to use a construction analogy: traditionally, most houses are
stick-built: you just nail together a whole bunch of lumber, one stick
at a time, until the house finally appears. The raw materials are simple:
wood, nails, glue, shingles, bricks. You’ve got all sorts of flexibility,
but construction takes a long, long time.

Today there’s another option: factory-built homes. Here, the house is
built in huge sections in a factory, mostly automatically. Each room,
pre-built, is loaded on a truck and the a crane puts the house together
on-site. Architecture is more constrained – you have to put the house
together from the array of room options supported by the factory – but
you can literally be moving into the house 30 days after signing off on
the final blueprints.

Frameworks encourage rapid development. It’s no coincidence that the Age
of the Framework is also the Age of Agile. Agile, XP, Scrum, etc. –
frameworks are at their best when used in a rapid-iteration style.

Good frameworks are open source. I don’t think I need to justify this
point to this particular crowd, so suffice to say that it’s no accident
that there aren’t any proprietary frameworks with any real following to
speak of5.

Finally, good frameworks make development fun. Business folk like
to think this is a silly requirement. It’s not. The best thing
about the web framework world is our sense of fun: fun motivates,
leads to experimentation, and hence to innovation.

What’s next?

I’ve described where we are now… so what’s next?

The best way to predict the future of web development, I think, is to keep
asking ourselves the question that led to all the past advances: what sucks, and
how can we fix it?

So: what sucks about web development?

Inter-operability

Modern web frameworks suck at inter-op.

Frameworks are good. But frameworks inevitably lead to lock-in. Lock-in is bad.

It’s important to realize that the most important kind of inter-operability is
with the user’s code, and frankly web frameworks often suck here. A basic truth
of software is that as it grows and matures it becomes more and more
domain-specific, and less and less generic. I’ll talk more about this below;
the important part for now is to realize that general frameworks should be able
to cede control to domain-specific replacements as the stack grows. For the most
part, frameworks don’t.

Of course, most people think of inter-op in terms of inter-operability between
multiple frameworks. Nobody’s doing very well here, but unfortunately the Python
web world’s worse than average. There’s a great deal of fragmentation in the
Python web community, and frankly Django’s not helping. That’s a bug in the
Django community, and there are similar bugs all over the Python web world. We
need to fix these.

WSGIis helping here; WSGI’s the best thing ever to happen to Python web
development. We can’t rest on our laurels, however: WSGI’s got some serious
problems. They’re off-topic here, so I’ll simply point you to James Bennett’s
Let’s talk about WSGI and say, “ditto.”

I should also mention Rack. Rack, in a nutshell, came about when the Ruby
world, facing similar problems we’d faced in Pythonland, created a WSGI-inspired
web gateway tool. It’s been a resounding success: Rails 3 is being rewritten in
Rack. Rack is frankly a bit better than WSGI; we Pythonistas should be
embarrassed by that.

The big problem, though – the elephant in the room – is that gateways suck,
too. Gateways aren’t APIs. There’s a limit – and it’s a low one – to the
level of inter-op you can obtain when the only interface you have is a gateway.
Even if we improve WSGI – and we should – it’ll only take us so far.

Even worse, tools like WSGI and Rack do nothing to help inter-language
inter-operability. I’d really like to write parts of my application in Python,
parts in Clojure, parts in Ruby, and even parts in Perl. Things like web
proxies, SOA, ROA, and language VMs help, here, but since gateways aren’t
APIs there’s only so far we can go.

This is going to be a hard problem to fix, even if we only focus on Python.
We’ve got a bunch of disparate communities, all comprised of volunteers. Very
few people have overlapping knowledge, few know how to navigate multiple
community standards, and fewer still have the impetus to work on inter-op.
Nearly nobody is thinking about multi-language inter-op.

But this stuff is incredibly important. If Django fizzles, I’ll be sad. But if
Python fails as a web language I’ll be devastated.

Rich web applications

I’m extremely excited about HTML 5. In fact, I think it could be the best
thing to ever happen to web frameworks. If web apps can truly replace desktop
apps then frameworks are going to be the place to be, and Python could kick some
serious ass here.

Right now, though, the current crop of tools suck at creating rich applications.
The current state-of-the-art is pitiful. The two approaches I’ve seen seem to be
either building parallel MVC layers on the client and the server and then
mashing them together somehow, or else inventing a tightly coupled
back-end-with-generated-front-end framework like GWT or SproutCore. Neither
approach makes me all that happy.

For example, take a look at 280Slides. It’s an amazing piece of web tech –
the browser truly disappears; it’s hard to tell that you’re not in a native
desktop app. It’s amazing.

However, the developers believed that 280Slides would be literally impossible to
write using any of the current web tools. They not only built their own
framework, Cappuccino; they actually invented a new language, Objective-J!
If this is a trend, it’s worrying.

Handling complexity (a.k.a. the deployment problem)

It’s a well-recognized fact that web applications are getting more and more
complex, and the list of things you need to successfully develop, deploy, and
scale a web app is getting longer and longer.

It turns out that writing the app is now the easy part; managing the rest of
the stack you need for successful deployment can be nearly impossible. In other
words, we’re all ops people now.

Some time ago, Leonard Lin collected this list of all of this “other stuff”
you need to worry about after developing your app:

API Metering

Backups & Snapshots

Counters

Cloud/Cluster Management Tools

Instrumentation/Monitoring (Ganglia, Nagios)

Failover

Node addition/removal and hashing

Autoscaling for cloud resources

CSRF/XSS Protection

Data Retention/Archival

Deployment Tools

Multiple Devs, Staging, Prod

Data model upgrades

Rolling deployments

Multiple versions (selective beta)

Bucket Testing

Rollbacks

CDN Management

Distributed File Storage

Distributed Log storage, analysis

Graphing

HTTP Caching

Input/Output Filtering

Memory Caching

Non-relational Key Stores

Rate Limiting

Relational Storage

Queues

Rate Limiting

Real-time messaging (XMPP)

Search

Ranging

Geo

Sharding

Smart Caching

dirty-table management

Yes, a modern web developer really needs to understand this stuff. Yikes.

The good news is that there’s open source software to fill all of these needs.
The bad news is that they’re all immature, disparate pieces with no connections
to each other. Getting even half of this stuff up, running, and integrated is a
monumental task.

There’s a huge opportunity here for Python. Python’s historically been used as a
“glue” language, though recently we’ve tried to de-emphasize that aspect. It’s
nothing to be ashamed of: Python’s a very good glue language.

Python could easily be the glue that keeps this huge stack from toppling over.

Scale

Internet usage is growing explosively. Worldwide it’s doubled twice since
2000… and global penetration is only about 25%6. This number’s
just going to keep going up.

Meanwhile, web sites are getting a lot more complex. Think back to 2000 – could
you have even imagined a site like 280Slides then?

Meanwhile, traffic is growing. The average user is spending more and more time
on the web, and think about what’s going to happen as the mobile web explodes.

We’re going to have to learn to deal with more and more and more traffic. And
frameworks suck at scaling.

Frameworks are very good at generic tasks. They’re meant to be: they abstract
away common difficulties. But as applications grow in scale, they need to get
more and more domain specific to be able to deal with scale. There’s a direct
correlation between the size of the site and how specific it is.

This usually breaks down as follows:

You develop your first little toy app using Framework X. (In the Django
world these seems to be a blogging app – it seems like at least 75% of
Django developers have built their own blog app.) This usually goes great.

Happily successful, you develop a product with the framework, and launch
it. This usually goes well, too – sites at the initial-launch stage are
still very similar to each other, and frameworks are great here..

As your site grows, you start to feel a bit of pain, and need to replace
some bits of the framework with domain specific bits. This usually isn’t
too bad: most frameworks, Django included, are modular enough that
you can easily swap out the more common non-scalable bits.

Then one day you become Twitter, and all hell breaks loose. You end up
having to essentially ditch the framework and re-write everything, from
scratch, in very very specific ways, just to deal with the crazy,
mind-boggling amounts of traffic you’ve got.

Frameworks work incredibly well to get you off the ground quickly… and then
usually fail miserably when faced with the specific needs of big sites.

This is an impossible situation for framework developers: by optimizing for a
quick start, by focusing on common needs, we’re essentially guaranteeing future
failure. Remember the “Rails doesn’t scale” pseudo-controversy last year? I
guarantee it’s only a matter of time until there’s an angry “Django FAIL” moment.

Frameworks ought to gracefully fade away as you replace them, bit by bit, with
domain-specific code. (This is what I meant, above, that inter-op is also a
scaling issue.) Right now, they don’t.

Concurrency

Of course, if you’re talking about scale, then you need to talk about
concurrency. That’s right, I’m gonna go there. I’m gonna talk about the GIL.
Don’t worry, though, I won’t dwell or complain.

First let’s look at some processors, shall we?

Today, right now, you can buy a top-of-the-line Intel Nehalem for about
$2,000. It’s got 2 hardware threads per core, and it’s available in an 8-core
configuration. This means 16 hardware threads on a single slot, so you can
easily build a box with 64 hardware threads (4 CPUs, 8 cores per CPU, 2 threads
per core).

Of course, if you want to get really serious you could buy something with Sun’s
UltraSPARC T2 (a.k.a. Niagara). This chip has 8 cores, 8 threads per core,
and you get two of ‘em in a single box, so that’s a whopping 128 hardware
threads per machine. Yes, the future of this machine is in doubt7,
but Sun’s been on the leading edge of concurrency for quite some time. It’s only
a matter of time until Intel and AMD catch up.

Obviously concurrency is going to be a Very Big Deal in the future. It already is.

Much of my thinking about this comes to me from Ted Leung. I look up to Ted,
and I’m sad to tell you that Ted says we’re screwed. I’m afraid that I’m
starting to agree. To some extent the “shared-nothing” architecture of most web
applications mean that we can just StartServers 128 to deal with 128
threads, but as applications grow you’ll usually need to start throwing up
“shared nothing.”

Most languages can really only saturate a single core, and if you can only use
a single core you’re in a lot of trouble.

Unfortunately, nearly all of this awesome work is going on in relatively obscure
languages like Scala, Erlang, Clojure, or Haskell. There’s almost
no forward motion in the Python community. Yes, I know all about Twisted,
Kamelia, Eventlet, etc.; these are all just twists on threading or IO-based
concurrency; there’s very little that’s really new going on in Python.

And though it’s sometimes considered taboo to say it, we have to be honest: this
is partially the GIL’s fault. It’s not clear to me weather the GIL would
preclude, say, STM, but it almost doesn’t matter: the existence of the GIL
basically sends anyone interested in concurrency running for greener pastures.

I have hopes for Unladen Swallow: the prospect of removing the GIL from
Python is a promising first step. However, really all we get from that are
better threads, and threads suck as a concurrency mechanism. I want my Actors, dammit!

This is where us web guys really need your help. We operate at a higher level of
abstraction so much of the time that we’re simply not qualified to figure out
how to make concurrency better in Python. At least, I’m not. I frankly barely
understand threads after a decade of using ‘em, and there’s no way I’ll be the
one to implement STM in Python.

Halp!

In the year 2020

By way of conclusion, I want you think try to imagine web development circa
2020. That’s no arbitrary year: it’s also Last Call for HTML 5, so it makes
sense to think about what the web’s going to be like when HTML 5 is mature. When
we’re finally developing the types of apps we’re just starting to dream of today.

I’m not sure I’ll be using Django in 2020. I hope I will, of course, but it may
be that Django simply can’t adapt in the next Age of web development.

However, if I’m not still using Python in 2020 I’m going to be seriously pissed off.

Joel tells us that good software takes ten years, so I think we need to
start right now. How can we work to make Python the language of choice for the
developers of 2020?

First, we need better inter-op. A better WSGI – WSGI 2? – will help, but we
need more communication and more APIs that work between frameworks.

The Django community needs to do a better job here, and I’m taking
responsibility for that. Keep complaining to me about Django’s lack
of inter-op, and I’ll keep working to fix it.

But more than that we need real leaders here. Someone who can show
us a way forward, and keep an eye on the bigger picture, not just
focus on a single framework.

We’ve got to get out in front of HTML 5. There’s a huge opportunity
for Python to be the backend language of choice for HTML 5 web
applications. We need to start thinking about this now.

We’ve already made a great transition from thinking about “web pages” to
thinking about “web applications.” It’s time for a new transition, for us to
start thinking about a holistic “web site,” and all it’s associated related
tech. Again, there’s a huge opportunity for Python: it could be what binds our
stacks together and makes deployment pleasant again.

I dream about a complete stack deployment framework, all tied together with
Python, probably built around WSGI 2.

We need to be thinking about scale from day one. This means being incredibly
skeptical of our own work, and continually asking ourselves where it’s going to
fail. We need plan for the day that our framework will be phased out.

Much of the web, unfortunately, hasn’t progressed much beyond this
point. PHP is still by leaps and bounds the most popular and widely-used
web technology. The future may be here, but it’s certainly not evenly
distributed yet.

At least two others, Procopius Waldfoghel of France and Laurens
Janszoon Coster of the Netherlands, may have been working on their own
presses around the same time as Gutenberg, and those are only the ones we
know about today.

Wooden movable type in China dates to the 10th century, and
there’s good evidance that both metal type printing presses were used in
Korea as early as the 13th century. If you want to know more, Wikipedia’s
History of typography in Easy Asia is as good a place to start as any.

You have to wonder if Sun would have failed
if we’d been able to write software that made us actually need
this many cores…

Comments

Nope, there’s no comment box. That doesn’t mean I’m
not interested in your thoughts, questions and criticism, but if
you’d like to see them published here, you’ll
need to make a bit of an effort. Send your comments to
my-first-name @ this-domain.

I reserve the right to decide what I publish, but in
general as long as your comments are accurate, offered in good faith,
and non-abusive, I’ll publish them
(even if they’re strongly critical).