Pages

Tuesday, May 27, 2008

I first started drafting this post a few months ago, out of excitement for the work that JP and Glyph have been doing in the Divmod open source stack codebase. I was planning on entering the acronym fray with a title like "*MAP: An Alterantive to LAMP" where *MAP (pronounced "starmap") would be "Any OS, Mantissa, Axiom, and Python." A good friend of mine whose opinion I value said that *MAP was a terrible name, and after chatting about it with Glyph, he commented "Why not keep it really simple? Just say 'Mantissa.'"

And so it is :-)

For those that don't know, Mantissa is the Twisted application server and Axiom is a Twisted-based object database. By virtue of what are called "deferreds," Twisted allows you to write highly concurrent applications. Mantissa -- the Divmod stack (Mantissa entails Python, Twisted, and Axiom because it requires them) -- provides developers a means of scaling their Twisted-based, asynchronous applications. This means that you can go from experiments or prototypes to multi-node production deployments with the same set of tools and code.

As such, this is a direct competitor for LAMP. Here are some questions about that: What is the value of a full stack? Why is an alternative to LAMP good or needed? What is a good alternative?

Stacked Development Value

What does a full stack give us, as developers? From a practical perspective, it:

eliminates the overheard involved in setting up a system in preparation for development

provides a development toolset

provides a context within which design patterns have been established and utilized

In other words, we can do things like pop in a CD, install an OS, have it meet all the software dependencies for our development tasks (since we're talking about LAMP, we mean development for the web), and either know how to build what we need or who to ask that can point us in the right direction. LAMP gives us this and, thanks to OS distributions like Ubuntu, gives it to us cheaply through simple button-pushing.

Do notice, however, that I said nothing about "going live" or "pushing to production"...

"The problem with the LAMP stack is that it's not a solution for the worst case scenarios. It's great for development: you throw it all together and start writing code. It's fairly okay for low-volume production use. But you need to plan for DoS attacks, search engine bot crawls, and spammer email address harvesting. Default LAMP installs fall over under such conditions."

This is a point that bears repeated belaboring: the network is violent and unpredictable. Connectivity can go away at any moment due to causes at pretty much all layers of the OSI model. The best practices for deploying applications in a production environment that keep this in mind are vast and varried. This is the domain of systems experts.

Sean made further comments concerning Google, that App Engine is so great because you write your code and then just throw the whole thing in their grid, and bam! instant scalability, protected by the (hopefully) same mechanisms that protect all of Google's publicly-facing web assets.

LAMP distributions productized and made freely available the otherwise painstaking process of compiling and installing a Linux kernel, Apache, a database, and your preferred programming language. The painstaking process was one that developers engaged in for software development. But what about the ones that systems engineers engage in for production deployments?

Google has addressed this in a "small way": massive in infrastructure support, but small in features. Knowning Google's penchant for incremental and steady service improvements, they've got plans for additional features. But I think everyone can agree that they're not going to try to meet everyone's needs all the time. Regardless, they are moving in the right direction: innovating a new platform.

"What then is needed? A platform that is created from the ground up ... What would such a platform look like? It would be hosted and (nearly) infinitely scaleable. It would provide object storage that’s as simple as saying 'here’s an object, store it' ... user authentication, authorization and access control. Flexible processing of pretty URLs. Easy creation and maintenance of page templates. Ability to send emails and process bounces. Handling of RSS feeds (inbound and outbound). Support for mobile access and possibly even voice capabilities."

Anyone that knows the Divmod software will know why this tickled us so: we have an object database (Axiom) with built-in user authentication, we've got object publishing (even with pretty URLs) and templating with Nevow, we've got mail services, feed support, mobile access and SIP. However! This isn't an advertisement; it's an illustration. The platform is part of the network, and in a sense, it is the network. Considerations for rapid application development need to be regarded very highly; I think it's fairly uncontested common knowledge that LAMP has proved this. Just as highly, though, we need to consider the needs of systems and of the engineers that are integrating them.

Google is making parts of its infrastructure available to developers now. With the dual ease of development and deployment, they are innovating engineering for us. They are only one of many, however. We need to be asking ourselves what our applications are, what the network is, what services are, and what our dev teams and engineers need.

Epilogue

This brings me to what I want for my birhtday :-) Hey IBM! Sun! I want access to a Blue Gene (a la Project Kittyhawk) or a Sun Grid. I want to prove the efficacy of LAMP alternatives in the changing internet, of Python's continued pertinence, Twisted's developmental power and Mantissa's deployment capabilities.

Paul, SQLite is *very* scalable: it's small and fast. To make something small and fast scale well is a matter of creative architectures; to make something large scale (like RDBMS) requires that you also throw a lot of big iron at it.

But, as JP recently pointed out, SQLite used to have information on its "limits" page that said the maximum size for a database was a petrabyte. Now there is none. So I guess it's also *large* and fast ;-)

That being said, our databases tend to be small :-) Furthermore, we architect our solutions differently than the traditional approach. We don't use one massive SQLite database; we use hundreds of them, all interconnected.

sorry to sound ignorant about Twisted, but aren't Erlang and OTP the right tools for asynchronous and highly parallel programming? I mean why use a python-based framework when there it seems like erlang is parallelism done right.

Is Twisted really different enough from LAMP to be worth the switch? If you're gonna switch might as well go to something really different like erlang.

These are my assumptions based on a very cursory knowledge of TWisted. Please disabuse me of any false notions I might have.

The chief advantage over Erlang, of course, is that Python has a lot more libraries available to it, both written in Python and wrappers around C libraries. Mantissa (and by implication Twisted) _does_ pursue a significantly different strategy from the LAMP stack, but choice of language isn't the major difference -- it's the perspective on concurrency, deployment, and scaling.

Let's not be religious about this or sensationalist; let's just look at it practically: there are *tons* of Python programmers and deployed Python applications. The "right tool" is determined by many different needs, and often organizations don't only use one tool. Know what I mean?

Twisted is one of the premier networking frameworks on the planet. It's for Python, an interpreted, object-oriented language. It's for the network. If you don't use any of those, then Twisted's probably not the right tool for you. If you're having problems with your network programming projects and could benefit from a language with Python's qualities, then there's a good chance Twisted could be a good fit.

"Is Twisted really different enough from LAMP to be worth the switch?"

That doesn't really make any sense to me. Python is already a part of LAMP, and Twisted is written in Python. I'm contrasting Mantissa with LAMP, and Allen's comment about that was well said.

I thought Bryan's question was about *Twisted* vs Erlang/OTP, not Python vs Erlange/OTP. That question doesn't exclude using Python and its libraries. Because of the constraints that allow Erlang to be reliable, it makes sense that direct access to foreign code must be used with caution. For example, you must take great care with linked-in drivers, and in many cases it's better to talk to other languages in their own process as an Erlang port. In fact, an example in the Python cookbook shows how you can use Python as an Erlang port.

So the question is, if Twisted's reactor/deferred style is so foreign to most Python programmers anyway, and Erlang has demonstrated real-world massive and fault tolerant concurrency for years, what would make it a better choice than Erlang/OTP if Erlang can still talk to Python?

(This is not a rhetorical question, I've been interested in Twisted for quite a while, but never made the full jump beyond tutorials because of the lack of documentation and uncertainty of the stability of some of the API's.)

Bryan's question was loaded because he used the phrase "right tool" and that indicates an absolutist perspective. There are many situations where Erlang could be the right tool. There are many were Twisted could be. There are many where neither would be the right tool. Due to his phrasing, it would be difficult for me to imagine that he had meant to include Python. But you are correct, there is nothing keeping folks from using both nor even from using Twisted and Erlang together.

What's more, he specifically said "why use a Python-based framework" when the whole point of this post was to show that there is more flexibility in the stack that we call LAMP than people tend to think. People swap out languages all the time. I'm encouraging people to also consider swapping out the databases and the web servers.

To your points, though: I object to this statement: "...if Twisted's reactor/deferred style is so foreign to most Python programmers anyway..."

I don't believe this to be true. For all the objections I hear, I hear just as many along the lines of "holy crap, I've never had so much fun as programming with deferreds" or "man, using Twisted's event driven model has made me a better programmer, problem solver, and thinker." There are many that love Twisted. There are many that don't. I don't believe we have the stats to indicate how many have actually tried it and hated it vs. those that have and love it.

What's more, Twisted has also "demonstrated real-world massive and fault tolerant concurrency for years." While it might not be as fault-tolerant as Erlang or fault-tolerant in the same ways, it's got a pretty sweet state mechanism thanks to Python. One can use objects, inheritance, composition, etc.

To answer your question, "what would make it a better choice than Erlang/OTP if Erlang can still talk to Python?" It seems obvious, doesn't it? Anywhere you want to have *both* concurrency *and* the benefits that come from Python as outlined above.

Which isn't to say that Python+Twisted is sooo great and it totally kicks Erlang's ass, etc., etc. I'm sure there are ways around the limitations that Erlang presents. Just as well, there are limitations around Python's reliability that Twisted inherited from it (we're working on one of these at Divmod right now). There's always something clever you can do that will allow you to keep using the tools you love, whatever those tools may be.

See, when approached with a balanced view, things don't have to be sensationalist or religious. When you need one thing, you've got it. When you need another, that's there too. It doesn't have to be all or nothing.

As for your forays into Twistedland, I'm rather amazed that you weren't able to find the extensive documentation that's available... because there is a lot of it now. One question: how long ago did you last look though the docs on the site? And perhaps a better question: what *type* of documentation were you looking for?