We've moved! Come visit our new blog:

Blog Archive

Thursday, June 25, 2009

On April 7th, Google launched a new version of Gmail for mobile for iPhone and Android-powered devices. We shared the behind-the-scenes story through this blog and decided to share more of our learnings in a brief series of follow-up blog posts. This week, I'll talk about the cache pattern for building offline-capable web applications.

I recently gave a talk (preserved YouTube here) about the cache pattern and the Web Storage Portability Layer (WSPL) at Google I/O. It was exciting getting to give a talk at the Moscone Center as previously I had only ever been one of the audience members. The conference seemed to go by in a blur for me as I was sleep-deprived from getting the WSPL to "just good enough" to actually be released. (And some ofyou have already pointed out that I missed several bugs.) In my talk, I provided a general overview of the cache pattern and this post expands on the handling of hit determination and merging server and local changes.

The cache pattern is a design pattern for building an offline-capable web application. We implemented the cache pattern to make Gmail for Mobile tolerant of flaky wireless connections but the approach is generally applicable. Here's how it works. Consider a typical AJAX application. As shown in the diagram, we have a web application with a local model, view and controllers. The user interacts with theapplication and the controller dispatches XmlHttpRequests (XHRs for short) to the server. The server sends asynchronous requests to the application which it inserts into the model.

As shown in this next diagram, in the cache pattern, we insert a cache between the application and the server. Having done so, many requests that would otherwise require a round-trip to the network.

A software cache like this one shares a great deal conceptually with hardware caches. When designing the cache used in Gmail for mobile, we used this similarity to guide our design. For example, to keep our cache as simple as possible, we implemented a software equivalent to a write-through cache with early forwarding and LRU eviction. The cache pattern in general (and consequently our implementation) has four important data flows as shown in the diagram.

Cached content bound for the UI.

Changes made to the cache by the user in the UI. These need to be both reliably sent to the server and updated locally in the cache so that reads from the cache for UI updates show the state including user changes.

The changes recorded in the cache need to be sent upstream to the server as the network connection is available.

Changes made to the server (like email delivery in the case of Gmail) need to be merged into the contents of the cache.

As shown in the diagram we also need a place to actually write the data. We use the WSPL library to write a cache implementation portable across both Gears and HTML5 databases.

To actually implement these four data flows, we need to decide on a hit determination mechanism, a coherency strategy and a refresh approach.

Hit Determination

At its heart, a cache is a mapping from keys to values: the UI invokes the cache with a key and the cache returns the corresponding element. While this sounds pretty simple, there is an additional source of complexity if the application wants to provide the user with summary listings of some subset of all values available from the server. To provide this feature, the cache needs to contain not only "real" data values but additional "index" values that list the keys (and possibly user-visible summaries) for "data" values. For example, in Gmail for mobile, the cache stores conversations as its "real" data values and lists of conversations (such as the Inbox in Gmail for Mobile) as its "index" values. Keys for index values are computed specially to record what subset of the complete index is cached locally. For example, in Gmail for Mobile, while a user's Inbox may contain thousands of conversations, the cache might contain an index entry whose data values lists metadata for only conversations 1000 through 1100. Consequently, Gmail for Mobile's cache extends keys with the cached range so that a request for metadata for conversations 1101 through1110 would be considered a cache miss.

Coherency and Refresh

Perhaps the most complex aspect of the cache implementation is deciding how to get updated content from the server and how to merge server updates with changes made locally. A traditional hardware cache resolves this problem by only letting one processor modify its a cache at a time and have the memory broadcast any changes to all the other caches in the system. This approach cannot work here because the Gmail server can't connect to all of its clients and update their state. Instead, the approach we took for Gmail for Mobile was for the client device regularly poll the server for alterations.

Polling the server for changes such as new email or the archiving of email by the same user from a different device implies a mechanism for merging local changes with server side changes. As mentioned above, Gmail for Mobile is a write-through cache. By keeping all of the modifications to the cache in a separate queue until they have been acknowledged, they can be played back against updates delivered from the server so that the cache contains the merge of changes from the server and the local user. The following diagram shows the basic idea:

The green box in the diagram shows the contents of the cache's write buffer changing over time and the cloud corresponds to the requests in-flight to the server with time advancing from left to right in the diagram. The function names shown in the diagram are from the simplenotes.js

example file in the Web Storage Portability Layer distribution. Here, the user has applied some change [1] and the cache has written it to the write buffer and has then requested new content resulting in query [Q]. The cache prefixes the outstanding actions from the write buffer to the query. Action [1] is marked as needing a resend on some sort of network failure.

Later, the user makes change [2] to the UI which causes the cache to append it to the write buffer in the applyUIChange call. Later still, another query is made and so, the cache sends [1][2][Q] to the server. In the mean time, the user makes yet another change [3]. This is written to the write buffer. Once changes [1] and [2] are acknowledged by the server along with the new cache contents for query [Q], changes [1] and [2] are removed from the write buffer. However, to keep the cache's state reflecting the user's changes, change [3] is applied (again) over top of the result for [Q].

Simplifying the implementation of this reapplication stage is the most important benefit of implementing a write-through cache. By separating the changes from the state, it becomes much easier to reapply the changes to the cache once the server has delivered new content to the cache. As discussed in a previous post, the use of SQL triggers can greatly improve database performance. Whether updating or re-updating, triggers are a great way to make the application of changes to the cache much more efficient.

Cached Content To the UI

The first of the four data flows is delivering content to the UI is reasonably easy: query the cache for the desired content and when the query completes, forward the request to the UI. If you look at the getNoteList_ function from the simplenotes.js example code included in the WSPL distribution, you'll see that the delivering cached content to the UI has the following basic steps:

perform hit determination: deciding if the requested cache contents are actually in the cache.

create a database transaction, and while in the transaction

query the database for the desired key

accumulate the results

then outside of the transaction, return the result to the UI.

Changes From The UI

The second flow (applyUiChange) is recording changes made by the user to the write buffer. It has a very similar structure

create a database transaction, and while in the transacation

write the change to the write buffer

wait for a trigger to update the state of the cache.

Updates Bound For The Server

As discussed above, once the changes have been written to the write buffer, they still have to be sent to the server. This happens by prepending them to queries bound for the server. The fetchFromServer from the example is responsible for this. As might be familiar by now, the flow is

create a database transaction and while in the transaction

query the write buffer for all the entries that need to be sent to the server

accumulate the entries

then outside the transaction, send the combination of changes and query to the server

Changes From The Server

Finally, we need to merge the changes from the server into the cache as is done in the insertUpdate method from the example. Here the flow is as follows:

create a database transaction and while in the transaction

update the directory

write the new content into the cache

touch the changes in the write buffer that need to be re-applied to the cache

wait for the trigger to complete its update

then, outside of the transaction, send the response to the UI if it was satisfying a cache miss.

That's a brief intro to the cache architecture as found in Gmail for mobile. We're continuing to improve our implementation of this basic architecture to improve both the performance and robustness of Gmail for mobile. Please stay tuned for follow on blog posts.

6 comments:

Nice, but I find it all pretty frustrating. After all, with HTML 5 the browser is simply becoming the JVM, then why not simply put Java on the iphone and on android (standard MIDP, not the non-standard android) and save us all this trouble? All of these abstraction layers are pretty annoying because they are designed to provide functionality that is both more mature and more performant with plain old Java. So why do we need to constantly use "new" frameworks that provide us with just less amd less?

Really? Well, guess what, I have a small and cheap Sony Ericsson G705. It handles has e-mail, RSS, podcasts and what-have-you. It can even run 3rd party programs in the background. Now, what can an iphone do that this little feature-phone can't? Well, it can run beautiful 3d games. But other than that? Oh, it can run javascript in the browser. Well, what do I need javascript in the browser for? for web-apps of-course. Which web-apps? Ummm let's say Gmail. Well, this blog has just shown that the Gmail webapp has to be re-written for the iphone anyways, and my Gmail midlet has the same capabilities. Then why write software again in yet another technology for a new phone that indeed requires a lot of power to run this technology, when you can provide the same functionality (and more) with an older, less wasteful and faster technology?

Is the cache layer or application on the mobile, fully business logic aware? There could be data inconsistencies if operations in (1) and (2) are invalid, and cannot be executed by the server and are marked as failures. What would happen to (3)? Would that be reverted as well?