It's weird to see people talk about Google App Engine online because I think many people focus on minor details. Like, to make apps scale horizontally you do need a "shared-nothing" infrastructure, so that's not really novel. The BigTable aspects are sorta interesting except there's nothing there (in terms of application design) that you couldn't have gotten out of the paper, and the App Engine API is so high-level it's not that close to BigTable. It's more like any other high-level flat-address object database, like maybe CouchDB. The Python thing is also pretty irrelevant; they just picked a language they have experience with (having Guido around helped), and it's easier to launch supporting one API than n languages' worth. A good engineer just solves the problem with the tools available, and Python is a pretty good tool to start with.

As for evil plans to steal ideas or code, that's between you and your skepticism. Big companies are surprisingly good at doing shitty things, and Google is definitely big, but it's also true that within Google people really try to do the right thing. I was touched to see a privacy-concerned friend of mine start using Gmail after he was hired, saying that only after he saw how seriously they take privacy inside the company could he feel confident about using it. But I can't tell you anything that will change your mind about this subject.

I developed an internal application using Google App Engine on and off over a period of months (during its development I kept trying it out) and then finally rewriting a few weeks before launch (after the APIs had all settled).

Here are some real problems I've encountered:

1) All code runs only in response to HTTP fetches. So that means no cron jobs, and no persistent server-side processes. I know I just wrote above that you can't really have persistent jobs if you want to scale, but ultimately real apps do occasionally need these. For example, imagine a timed test app that needs a consistent view of time no matter which server (or datacenter!) the user hits. A time server becomes a single point of failure but when it's critical for your app it can be engineered around.

2) No long connections means no "comet" (server-push messaging).My first thought on hearing about App Engine was to port lmnopuz but I can't.

3) Playing around with your data is hard. Since there's no way to perform operations on your data except by uploading code to the server, you're often left creating a new URL per operation you want to perform. Hacks like the shell helps with this, but a lot of the time I want to be able to just run a local script and see the output. (For my project I found a decent workaround: make a URL that accepts Python code as a POST and runs it. Then your scripts just need to know to serialize themselves into strings and send them over the wire.) But see the next point.

4) Slow table scans. My app had ~1200 rows that it performs various analyses on and produces graphs. I can appreciate that such a query is labor-intensive, and so I had written it to cache the results of the graph generation (the rows only change once a day). But I can't even seed the cache once because fetching 1200 rows is too slow to happen within a single query.

5) Bulk operations are hard. Say you want to delete all objects in a table (or class, I forget the App Engine term). The "delete" operation requires you fetch the object first, and then you're back into slow table scans land. The best you can do is batch up your processing into multiple smaller stages, each of which write their intermediate output into the data store: either make a page that auto-refreshes itself with Javascript and leave a browser pointed at it, or make a command-line script that repeatedly hits a URL on your app.

6) No arbitrary queries. (If you haven't read the docs in detail, you wouldn't know this, but any query that involves multiple attributes [columns, if you're still thinking SQL] of an object must have an index exactly matching the query. They make index creation and maintenance trivial, and even automatic in most cases.)Though everyone's repeatedly shoehorned SQL underneath object-relational mappers, App Engine (and others) demonstrate that you can provide an object storage API and gain performance by not using SQL underneath. I argue the real utility of SQL is that it lets you quickly (in terms of programmer time, not machine time) perform queries that you haven't done before and won't do again. Say I learn about a bug where I built all of March's data with the word "none" in place of where a column should really be null (None in Python terms) -- that's a line of SQL to fix but it's a world of pain with App Engine due to the bulk operations thing.

With all that said, it's still pretty good. When I was looking to switch projects about a year ago, it came down to basically three projects and App Engine was one of them, because the guys who work on it are some of the best hackers I know at the company. All of the above bullet points (and minor stuff like the languages thing) aren't fundamental limitations of the design, they're temporary flaws that can be solved by good engineering and are surely being prioritized by the team. I'm pretty confident it'll improve rapidly.

The rows were tiny! It's because each fetch is an index lookup. If you get five seconds for your page to render, that's 5000ms, so it only leaves a bit over 4ms per entry fetch.

And regarding the security hole: yep. :\But it's an internal app, so if someone destroys it there's something more seriously wrong than my security model. You could imagine it requiring an "admin" cookie of some sort.

So that means no cron jobs, and no persistent server-side processes. I know I just wrote above that you can't really have persistent jobs if you want to scale, but ultimately real apps do occasionally need these.

I just meant that they become single points of failure and also single points of bottleneck. Certianly you can build a distributed redundant system out of multiple server-side jobs, but that's outside the skill set of most developers.

Certainly you can build a distributed redundant system out of multiple server-side jobs, but that's outside the skill set of most developers.

*furrows brow*

You know, i'm not an expert in parallelization, but i like to consider myself at least competent. But i really cannot think of a way of approaching this except for modulo-n sorts of solutions. At some point, you have to accept a single centralized point, perhaps one that's totallly implicit, or one that's master-master replicated, but it's inevitable. And this single point trickles outwards, into questions like the one you pose. Or am i missing something fundamental about partitioning in the large?

First thing that comes to mind is: have timed job requests sent to a set of redundant alarm servers. Have the requested events come through the front door fault-tolerant mux (or any old round robin with failover scheme.) Have each instance do a test-and-set on a unique task identifier in the datastore so only the first instance can run with per-task IDs (and the alarm servers know not to send the 2nd request (or 3rd etc, if it's too late; no harm done.)

Most of these sorts of things at Google start with a highly-available and reliable distributed store for tiny bits of data: Chubby, which is effectively a centralized point. You can then (effectively) stick a list of which servers to use into Chubby. (Mike Burrows, who wrote that paper and Chubby, is famously known as the b in bzip2!)

Re: Ning

Re: Ning

Like I was trying to say in the post: people who code for a living and know a bit about what they're talking about (like say, you or me) are fine using Perl or Python or whatever for "real" applications. This whole "paradigm shift" the anon comment suggests happened a decade ago (certainly LJ dates back that far).

Re: Ning

I think problems with scripting languages generally manifest themselves as issues with sloppy programming/design more often than perf problems.

I've learned to write excessive amounts of paranoia into my scripting language code so that I don't get stuck supporting backward compatibility with things that shouldn't have worked in the first place. This paranoia probably hurts performance at least some of the time. I'm starting to prefer languages that let me write down the API contract more completely[1] to avoid misunderstandings later. It's not a perf thing at all.

You're right about SQL. If you just want to persist class instances then an object store is far better than SQL. But I've never worked on a project where that was the case: you always need to be able to do the unexpected. Applications are views of the data, and their internal state is a transient representation of what's beneath, whether that's SQL or not.

In theory, you can do your normal SQL-like operations from Python code (which is more powerful than SQL anyway). In practice, the bulk operations thing makes it impossible for stuff that doesn't hit indexes.

Re: Practical data Access

They mention in the doc why these sorts of queries aren't supported.

Here's some undereducated guessing on my part.. If you consider what an SQL database must do to answer such a query, it also won't be able to use indexes on both columns simultaneously. (I'm guessing here -- it seems that a database must use one index and then merge its results against using the other.) In theory you can do the same sort of merging operation in your code:yesterday_accounts = set(Account.all().filter("date_updated <", yesterday).fetch())nonzero_accounts = set(Account.all().filter("account_balance >", yesterday).fetch())return yesterday_accounts.intersection(nonzero_accounts)What you lose there versus SQL is that you're fetching all these unused accounts from the datastore (which is a bandwidth thing) and that you're not doing the merge inline (though perhaps the iterator version of fetch returns them in some well-specified order and would do that). It fails if you have more than a thousand accounts in either category, but in theory the SQL query also slows down as you add more rows.

I guess their suggestion would be to denormalize your data more -- if that query matters, make a table of accounts with nonzero balance and put a date index on it. I haven't yet decided how painful that is for real applications (modulo what I've written in the post.)

Thanks for the post, a question about table scan

I don't know about others, but returning 1200 rows of data seems too trivial to be a problem of any sort if say we were using a relational database, the truth is that even if i had to get this off a file, it's trivial and should be relatively fast. I just wonder whether this is a bigger problem than it appears to be.

This post is indeed different

Thanks for this post. I am using appengine for quite a while now. To bad that I didn't had this list, when I was starting. By now, I can feel with you and feel the pain of every single point.

As with most tools, appengine is no silver bullet, and only good for some things. I think, its great to have it, when you need an app running fast. I once impressed duly impressed a friend, by coding a remote UI for his IP robot in 2 hours and had it also running for everyone in the web. But I would actually never consider it, to build a feature rich app, especially with complex persistent data structures.

Thanks for this honest sharing of appengines real problems. People should write posts like this far more often.

This post is indeed different

Thanks for this post. I am using appengine for quite a while now. To bad that I didn't had this list, when I was starting. By now, I can feel with you and feel the pain of every single point.

As with most tools, appengine is no silver bullet, and only good for some things. I think, its great to have it, when you need an app running fast. I once impressed duly impressed a friend, by coding a remote UI for his IP robot in 2 hours and had it also running for everyone in the web. But I would actually never consider it, to build a feature rich app, especially with complex persistent data structures.

Thanks for this honest sharing of appengines real problems. People should write posts like this far more often.