Memcached: When You Absolutely Positively Have to Get It To Scale the Next Day

Why should “Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0” care about a piece of technology like memcached? Because it just might save your bacon, that’s why.

Suppose your team have just finished working feverishly to implement “Virus-a-Go-Go”, your new Facebook widget that is guaranteed to soar to the top of the charts. You launch it, and sure enough you were right. Zillions of hits are suddenly raining down on your new widget.

But you were also wrong. You woefully underestimated in your architecture what it would take to handle the traffic that is now beating your pitiful little servers into oblivion. Angry email is flowing so fast it threatens to overwhelm your mail servers. Worse, someone has published your phone number in a blog post, and now it rings continuously. Meanwhile, your software developers are telling you that your problem is in your use of the database (seems to be the problem so much of the time, doesn’t it?), and the architecture is inherently not scalable. Translation: they want a lot of time to make things faster, time you don’t have. What to do, what to do?

With appologies to Federal Express, the point of my title is that memcached may be one of the fastest things you can retrofit to your software to make it scale. Memcached when properly used has the potential to increase performance by hundreds or sometimes even thousands of times. I don’t know if you can quite manage it overnight, but desperate times call for desperate measures. Hopefully next time you are headed for trouble, you’ll start out with a memcached architecture in advance and buy yourself more time before you hit a scaling crunch. Meanwhile, let me tell you more about this wonder drug, memcached.

What is memcached?

Simply put, memcached sits between your database and whatever is generating too many queries on it and attempts to avoid repetitious queries by caching the answer in memory. If you ask it for something it already knows, it retrieves it very quickly from memory. If you ask it for something it doesn’t know, it gets it from the database, copies it into the cache for future reference and hands it over. Someone once said, it doesn’t matter how smart you are, if the other guy already knows the answer to a hard question, he can blurt it out before you can figure it out. And this is exactly what memcached does for you.

The beautiful thing about memcached is that it can usually be added to your software without huge structural changes being necessary. It sits as a relatively transparent layer that does the same thing your software has always done, but just a whole lot faster. Most of the big sites use memcached to good effect. Facebook, for example, uses 200 quad core machines that each have 16GB of RAM to create a 3 Terabyte memcached that apparently has a 99% hit rate.

Here’s another beautiful thought: memcached gives you a way to leverage lots of machines easily instead of rewriting your software to eliminate the scalability bottleneck. You can run it on every spare machine you can lay hands on, and it soaks up available memory on those machines to get smarter and smarter and faster and faster. Cool! Think of it as a short term bandaid to help you overcome your own personal Multicore Crisis.

What’s Needed

Getting the memcached software is easy. The next step I would take is to go read some case studies from folks using tools similar to yours and see how they’ve integrated memcache into their architectural fabric. Next, you need to look for the right place to apply leverage with memcache within your application. Memcached takes a string to use as a key, and returns a result associated with the key. Some possibilities for keys include:

– A sql query string

– A name that makes sense: <SocialNetID>.<List of Friends>

The point is that you are giving memcached a way to identify the result you are looking for. Some keys are better than others–think about your choice of key carefully!

Now you can insert calls to memcached into your code in strategic places and it will begin to search the cache. You’ll also need to handle the case where the cache has no entry by detecting it and telling memcached to add the missing entry. Be careful not to create a race condition! This happens when multiple hits on the cache (you’re using a cache because you expect multiple hits, right?) cause several processes to compete with who gets to put the answer into memcached. There are straightforward solutions available, so don’t panic.

Last step? You need to know when the answer is no longer valid and be prepared to tell memcached about it so it doesn’t give you back an answer that’s out of date. This can sometimes be the hardest part, and giving careful thought to what sort of key you’re using is important to making this step easier. Think carefully about how often its worth updating the cache too. Sometimes availability is more important than consistency. The more you update, the fewer hits on the cache will be made between updates, which will slow down your system. Sometimes things don’t have to be right all the time. For example, do you really need to be able to lookup which of your friends are online at the moment and be right every millisecond? Perhaps it is good enough to be right every two minutes. It’s certainly much easier to cache something changing on a two minute interval.

Memcached is Not Just for DB Scaling

The DB is often the heart of your scalability problem, but memcached can be used to cache all sorts of things. An alternate example would be to cache some computation that is expensive, must be performed fairly often, and whose answer doesn’t change nearly as often as it is asked for. Sessions can also be an excellent candidate for storage in memcached although strictly speaking, this is typically more DB caching.

Downsides to memcached

Memcached is cache hit frequency dependant. So long as your application’s usage patterns are such that a given entry in the cache gets hit multiple times, you’re good. But a cache won’t help if every access is completely different. In fact, it will hurt, because you pay the cost to look in the cache and fail before you can go get the value from the DB. Because of this, you will need to verify that what you’re caching actually has this behaviour. If it doesn’t, you’ll need to think of another solution.

Memcached needs memory. The more the merrier. Remember that Facebook is using 16GB machines. This is not such a happy story for something like Amazon EC2 at the moment, where individual nodes get very little memory. I have heard Amazon will be making 3 announcements by end of year to help DB users of EC2. Perhaps one of these will involve more memory for more $$$ on an EC2 instance. That would help both the DB and your chances of running memcached on EC2. There are other problems with memcached on EC2 as well, such as a need to use it with consistent hashing to deal with machines coming and going, and the question of latency if all the servers are not in the EC2 cloud.

Memcached is slower than local caches such as APC cache since it is distributed. However, it has the potential to store a lot more objects since it can harness many machines. Consider whether your application benefits from a really big cache, or whether some of the objects aren’t better off with smaller, higher performance, local caches.

Static Web Pages. Sometimes you can pre-render dynamic behaviour for a web page and cache the static web pages instead. If it works out, it should be faster than memcached which only caches a portion of work required to render the page, usually just the DB portion. However, rendering the whole page is a lot of work, so you probably only want to consider it if the page will be hit a lot and changed very little. I’ve seen tag landing pages (e.g. show me all the links associated with a particular tag) done this way and only updated once a day or a couple of times a day. You may also not have a convenient way to do static pages depending on how your application works. The good news is static pages work great with Amazon S3, so you have a wonderful infrastructure on which to build a static page cache.

Conclusion

Memcached would seem to be an absolutely essential tool for scaling web software, but I noticed something funny while researching this article. Scaling gets 765,013 hits on Google Blog Search. Multicore (related to scaling) gets 88,960 hits. Memcached only gets 15,062. I can only hope that means most people already know about it and find the concepts easy enough to grasp that they don’t consider it worth writing about. If only 1 in 50 of those writing about scaling know about memcached, its time more learned.

Go forward and scale out using memcached, but remember that its a tactical weapon. You should also be thinking about how to take the next step, which is breaking down your DB architecture so it scales more easily using techniques like sharding, partitioning, and federated architectures. This will let you apply multiple cores to the DB problem. I’ll write more about that in a later post.

11 Responses to “Memcached: When You Absolutely Positively Have to Get It To Scale the Next Day”

Seems to be a solution created to coverup MySQL’s lack of caching and concurrency handling capabilities. May be the better solution is to move to a “professional” DB, which gets caching right like Oracle DB.

But I think you need to think again. MySQL has both caching and concurrency handling. And memcached is about more than simple caching. It’s about using a distributed network of machines to create a larger cache than may be feasible on the machines you run your DB server on. Even Oracle may benefit from it depending on your application.

[…] for the replication. Instead, we might consider a vast in-memory cache. Software such as memcached does this for us quite nicely, with another order of magnitude performance boost since reading […]

Imagine, customers forgoing million-dollar RAC installations to run MySQL/PostgreSQL with memcached and not even flinching at their propaganda about “professional” databases?

My only comment is that many of the memcached articles, including the main description on Danga’s front page, gloss over the very important detail of handling cache synchronization/expiry. I fear that too many people assume that memcached or assume it is some kind of distributed, enhanced mysql query cache that will seamlessly alleviate their scaling issues. What you gain in read performance you’ve lost in additional complexity in dealing with potentially different values in your cache than you have in your database. Depending on your app, this might not matter much at all or might be a huge problem.

[…] CloudSAP + Sybase: Sy-Who?Forrester Full of Balooney10 Things You Don't Need to Do In the CloudsMemcached: When You Absolutely Positively Have to Get It To Scale the Next DayAboutGuido is Right to Leave the GIL in Python, Not for Multicore but for Utility ComputingCatching […]

[…] they are exploiting the willingness to undertake a much smaller change in the form of adding Memcached to your conventional SQL database (BTW, that Memcached article remains a long-term popular post on […]

I have a small handful of favorite techs and memcache is one of them. Brilliant stuff. Few downsides (as you mentioned), but those downsides are extremely visible, so you know exactly what you’re getting into and work with.

Isn’t APC an op-code cache? While I don’t at all disagree with your list of pros and cons I would point out that both of these caches work to increase performance but in different ways. As a result they could both actually be used quite powerfully in conjunction with one another without “stepping on each other’s toes” (as opposed to something like varnish and memcache which will only create a lot of headaches).