Saturday, April 11, 2015

While my previous post was cathartic to write, it was not useful. In the hours that followed I became aware of others who share those same feelings. Through some online conversation I found a few very good solutions, further distilled my thoughts, and found some great resources that deserve to be shared.

First, I would strongly urge everyone working in, around or near a Python application to watch this talk. All of it. It just keeps getting better and more specific the deeper Glyph gets into it. While given at Djangocon as a keynote, it is applicable broadly.

Your Python application should be a Python application, not a plugin for a web server.

Your web server should be something you can import. Your application should not be something imported by a web server. This is an important distinction. The difference here is familiar; it is the distinction between framework and library. Having a web server import your application turns it into a peg that must be properly shaped to fit into it's corresponding hole. Over time, the effects of various third-party libraries (e.g. something importing lxml) become harder to control and predict in relation to the peg's shape. Flip this over and force the web server to be a properly behaved unit of Python which may be used like any other unit of Python: imported, tested, etc.

Developers, this will demystify deployment. The magic that happens in production will suddenly be attainable inside your development virtual environment. There will be fewer (or no) surprises. Rather than fighting with some strange piece of software written in C, you will be doing what you've always been doing: installing a dependency and using it.

SREs, this will help you get out the door at 5PM and maybe sleep through a few more nights. What developers do locally in development will work in production. Re-read that a few times. This is the sad current state of affairs in so many deployment scenarios, and we've all sat back and accepted it! How many times have you issued a rollback because production and development behave completely different? It won't solve every single one of these issues, but it will help enough that it warrants attention. By allowing the development environment to closely parallel the production environment developers will be solving production problems for you, before making it into production and wreaking havoc.

There are several WSGI containers to choose from that are well-behaved Python modules.

There are several, including cherrypy, and twisted.web. I am currently swooning over twisted web's WSGI container. Now sure, I said above that your web server should be something you can import, and the docs for these show examples of running a WSGI application in a manner that is slightly different. However, these (and some others) WSGI containers are well-behaved Python applications backed by well-behaved (and directly usable) Python packages. You can write your own script that imports the WSGI container and starts serving your application. When push comes to shove, you can treat the container like any other library, like real Python. There's no mystical loader machinery to work around. Want to know what twistd is doing when you tell it to run your app? It's right here, in Python.

You will have to do a little bit of work, and you will have to understand what the web server is doing.

And that's a really good thing. You should know what your web server is doing. Application developers may have to look at some documentation or code for a few minutes before properly initializing a WSGI container and serving it. Someone will have to take the time to write something slightly more sophisticated than app.run() in your flask app, but it will only take a few lines and a few minutes to do so, and then you are developing on production infrastructure.

On the SRE side, there may be slightly more work as well. You might have to use monit or supervisord to run a process for each core. But this means you are explicitly in control of the process model of the web server. Rather than let declarative configuration options rigidly choose between a handful of ways to manage processes, you use a battle-tested tool you are comfortable with to precisely control the process model of the web application.

The entry-point into the application can be made to be the exact same whether I am running a development server on my laptop or behind a load-balancer in production. This will eliminate a whole class of unknowns.

This little bit of work is up-front and one-time only. As the saying goes, an ounce of prevention is worth a pound of cure.

Thursday, April 9, 2015

Years ago I built web applications in Python. The first one predated all of today's popular web frameworks. This was long before Flask or even Django. Pylons still didn't exist yet. We argued about Cheetah versus Mako templates. My team on my first python web app actually implemented paste.httpserver in its current, threadpool'd incarnation (approximately 10 years ago).

About six years ago I more or less walked away from Python. Not because I wanted to, but because Google required me to write C++, and I was happy enough to do so. I did write a tiny bit of Python from time to time, but my bread and butter for several years was C++. After Google, I found myself dabbling in a bit of C, Java, Go and Ruby.

Now I'm back working day-to-day with Python. I just had my first experience in almost 6 years with web application deployment, and all I can say is, how did it end up like this? Who thought this was a good idea?

What am I talking about? I'm talking mostly about Gunicorn and uWSGI. Having deployed dozens of web apps a decade ago, I knew then that mod_python and mod_wsgi were a bad idea. Gunicorn and uWSGI are the natural result of spelunking deeper into that same (or very similar) rabbit hole.

Now, what has been the driving force behind these monoliths? Why have people chosen the sweat, blood and tears of deploying an application on an application server despite the gotchas, the errors, the hundreds of configuration options?

THE NEED FOR SPEED!

There exists a depressingly huge segment of the population that makes decisions in the following manner:

You wrote an application in Python. It's not going to be fast. C is fast. Java is fast. C++ is fast. Go is pretty darn quick. Python is not fast. Think about why you are using Python, this is extremely important.

Because it's productive.

Performance still matters, but in choosing Python you made the decision that productivity is a higher priority than performance. When push comes to shove, you're actively, consciously sacrificing performance for productivity. You can buy more performance, but buying more productivity is markedly harder. And that's probably a really sensible decision. You should stick by it and be proud of it.

So why are people using uWSGI, Gunicorn, mod_wsgi and so on? Because it's snake oil. Because pretty graphs proved to you that it was twice as fast. Because pretty graphs showed it could handle three times as many concurrent users.

But these numbers were derived in one of two ways. Either from an application that is little more than return "hello world", or on some crazy harebrained, super high-volume application at some company that had the developer resources on hand to develop something like a Tornado web app (and all of the corresponding infrastructure, since you won't be using full-blown SQLAlchemy in such an app). Allow me to let you in on a dirty little secret:

The amount of time your application spends executing application code is going to be drastically higher, as in orders of magnitude higher, than the time spent by the server writing bytes to a wire.

Here's a tidbit about every single performance comparison I've seen around paste.httpserver: they all use the defaults for paste.httpserver and a few others, and they all carefully configure the ones that demand it (mod_wsgi for instance). For example this one here. Had paste.httpserver been setup with multiple processes and given enough threads to match mod_wsgi in memory consumption in that article, well, you can go ahead and guess at the results; it would have been impressive. I suppose people don't realize paste's default threadpool size is 10 threads. And paste forces you to offload process supervision to an actual process supervisor (which is probably a good thing). And you're responsible for spawning a process for each CPU. But do that, and set it up comparable to your finely-tuned application server, and you will be blown away at how your real-world application performs on 10 year-old technology.

Here's another fun thing to think about regarding performance comparisons. If you're slamming a real application with 3000 requests per second, what's it doing to a database and other services?

But you know how to setup uWSGI/Gunicorn/mod_wsgi/whatever, and you feel why not? Surely this performance boon is practically free, so you might as well take it.

Well, what do these application servers do? Presumably they run a python interpreter and call your wsgi_app(env, start_response) function. They understand Python enough to execute it and do some voodoo magic to turn that wsgi response into bytes on a wire.

And that's where the similarities to a real, sane Python interpreter end. Abruptly. Full stop. The environment your Python code runs in would be hard pressed to look any more different from a real Python interpreter.

If it works correctly in pure Python, then it should work in production.

The very core belief that led me to write this article is this: if the application works in development on your machine, it should work in production and every spot (staging, QA, etc) in between. Couple that with the fact that you most likely cannot control everything going on in your application (dependencies of third party libraries/systems), and you can end up in a situation where your pure Python web application code works perfectly on your system while failing, freezing or crashing on a production system.

But oh, there are workarounds. Workarounds abound. Are you opening some resource as a side effect of importing a module (hey there 90% of "settings" modules I've seen in the wild)? Then go ahead and make sure your application server is forking before loading. But bear in mind that's going to hurt performance and consume more memory which is why you went down this path. Are you using threading? Be careful, you might need to make sure your application server isn't using sub-interpreters. Are you using C extensions? Again with the sub-interpreters (which by the way are the default for mod_wsgi and uWSGI, at least). Most deployments I see these days are strictly forking (and probably load-after-fork or heavily decorated to do as such) with threading disabled. Are you not using sub-interpreters? Be careful about global namespace pollution.

As you can see, you can quickly find yourself in a situation where the environment in production is the wild wild west of Python interpreter environments, and is really nothing like a Python interpreter launched from the command-line. Furthermore, you can find yourself in a catch-22. E.g. your application won't even start in a sub-interpreter environment, but without it you are polluting some global namespace and getting odd crashes (or worse). Or you spent weeks developing, and when you deploy on an application server it manages to segfault uWSGI without so much as a whisper in a log.

These are real world examples. These are things I have seen with my own two eyes.

So what am I advocating? To be honest, I'm not even completely sure. Years ago we used paste.httpserver processes managed by supervisord and reverse proxied by lighttpd (nginx didn't exist yet or only had documentation in Russian). Without a sub-interpreter, without disabling or enabling strange harebrained options, without peppering the application with strange decorators tightly coupled to the application server, without fighting to get an application that already works to...work.

After that application (this was circa 2005), I preferred multiple supervisord managed, threaded fastcgi processes reverse proxied behind nginx. It was efficient, easy to setup, and robust. I did some performance testing with real applications and found absolutely negligible performance gained by backflipping through mod_wsgi. Later I started mulling over the idea of just serving http over a socket, and I'd bet that's almost as efficient with the added benefit of being able to tinker on the actual processes themselves (which is nice for operations).

Look, maximizing the performance of an application server is not magic. Sure, some string manipulation happens in C so the very marginal part of your app where some bytes in ram get put into HTTP format and shoved in a buffer is faster, but Python is probably good enough at that, after all your application is written in it.

Perhaps I don't have anything concrete to advocate. It just seems to me that a Python application should be run by a Python interpreter, not some strange process-managing server that mangles the interpreter to the point that very basic, core functionality becomes impossible. And if this means pure Python putting bytes on a wire, it seems worth the tradeoff for a consistent environment, a distinct lack of show-stopping bugs, and frankly simpler operations.