I have file data (specifically language resource files). These files are automatically generated using machine translation api's (goog translate). They change relatively infrequently but when the master one changes (new string added or changed), this causes all the other language files to be updated automatically.

I'm trying to decide between serving these files directly from the blobstore or serving them from memcache and storing them in the datastore.

3 Answers
3

Nick Johnson described the speed tradeoffs in this article. The blobstore is best at handling uploads from users. For your problem, you will probably get the fastest and cheapest performance using the memcache backed by the datastore. In python, NDB will automate this for you. In java, use objectify.

@mjibson thanks for the link. It verifies that memcache is faster than serving local resource files but does not compare against blobstore. Although I still wonder whether localfiles and even blobstore files for that matter are cached in a Google CDN of some sort as well.
–
alooFeb 11 '12 at 6:27

It really depends on what you're serving. When people talk about the blobstore they are generally talking about large data (media files) that aren't going to fit in memcache. Our app serves up a lot of audio files and I've found that the blobstore is particularly good for this because it supports progressive-http download.

In both cases the lookup time is virtually instantaneous (they are both just maps and you look up data by a key). The time it takes to serve it depends on the item being returned. I can't think of any reason why I would take something from the blobstore and put it in memcache. It's really not going to save any time.

I just realized that you did specify that this is file data. I'd recommend putting them in the datastore and using memcache to serve them. Objectify would really be your friend here.
–
Rick MangiFeb 11 '12 at 19:20

Yup these are plain old text files which are ~10kb in size.... and yes we use objectify heavily :)
–
alooFeb 11 '12 at 20:21

If you're serving them externally via http to end users you can also cache them upstream by adding the appropriate headers to the response. This cut down our costs significantly.
–
Rick MangiFeb 11 '12 at 22:50

The answer to every "which is faster" question is "benchmark it". The particularities of your setup (disk speed, memory access latency, bandwidth, demonic infestations) make any general answer about performance chancy at best. The fact that you're running in Google App Engine just makes this even harder - you don't know what hardware you're going to get! So test it.

That said, it is likely that a local(ish) memcache like Google provides will be faster than anything that might involve hitting the disk. Memory access latency is an order of magnitude faster than disk access latency, and memory bandwidth is a hundred times or more that of even the fastest SSDs on the market today.

So, if you can afford the RAM and you want to maximize your responsiveness, storing your data in memory is generally more efficient.

This is misleading in several levels. Google's memcache implementation stores your data not in your app instance's RAM but in a separate server that is part of Google's infrastructure. It costs next to nothing to access the memcache, so you are almost certainly could afford to use it, the catch is there is no guaranteed persistence. Caching data on the instance's memory is possible but with limited effectiveness since we have minimal control on the lifecycle and addressability of an instance. Parts of your advice might be suitable for a normal cluster environment, but GAE is a different beast.
–
Ibrahim AriefFeb 11 '12 at 0:19

@IbrahimArief That why I specified a "local" memcache - Google makes no guarantees as to the locality, but even TCP-over-Infiniband to a remote machine's memory is faster than hitting a hard drive. Regarding persistence, no memcache I know of survives a reboot. The implicit assumption here is that there's a copy of the data in some slower persistent storage.
–
BorealidFeb 11 '12 at 2:02