Con­sis­tent Hash­ing With mem­cached

Con­sis­tent Hash­ing is a method that’s widely used to re­duce cache in­val­i­da­tion. Let’s take a closer look at how it can be used.

OpenSource For You
- 2013-05-10
- CONTENTS
- By: Har­ish Babu The author, em­ployed by a lead­ing tele­com VAS provider, is an open source fan who likes ex­plor­ing new tech­nolo­gies. He can be reached at har­ish (dot) babu (at) gmail (dot) com.

Mem­cached is a pop­u­lar in-mem­ory, dis­trib­uted key­value store that is fre­quently used as a caching layer (es­pe­cially for web­sites). It was de­vel­oped in 2003 by Brad Fitz­patrick for host­ing his web­site LiveJour­nal. Since then it has be­come ex­tremely pop­u­lar and is be­ing used in Face­book, Zynga and Wikipedia.

Dis­tribut­ing keys and val­ues

Mem­cached is a dis­trib­uted key-value store, which means that it dis­trib­utes the key-value pairs across mul­ti­ple cache in­stances.

Con­sis­tent hash­ing is a method of dis­tribut­ing data across mul­ti­ple cache in­stances such that an ad­di­tion or a re­moval of a node causes less dis­rup­tion in the cache hits.

The way Mem­cached dis­trib­utes the key-val­ues is pretty sim­ple, if there are mul­ti­ple Mem­cached in­stances: 1. For a given key, the client cre­ates a hash (hash (key)) and then maps it to a par­tic­u­lar hash in­stance us­ing the

mo­dulo op­er­a­tion - hash (key) % num­ber of in­stances. 2. The client stores the value in the in­stance that matches the re­sult of the above op­er­a­tion. Sim­ple enough, right? But let us say that we have reached a stage where the ex­ist­ing in­stances of the cache have out­grown the amount of data they can cache – for in­stance, if your sub­scribers have grown 10- fold and the num­ber of hits has gone up 20- fold. The log­i­cal thing to do would be to in­crease the num­ber of cache in­stances. And therein lies the prob­lem— ev­ery time a new in­stance is in­tro­duced, the sec­ond vari­able in the above op­er­a­tion ( the num­ber of in­stances) changes. And when that hap­pens, a key pre­vi­ously mapped to one in­stance would now be mapped to an­other.

Let me il­lus­trate that. Let’s as­sume there are 10 in­stances of Mem­cached. Let me try to store a key/value into this clus­ter. Let me also as­sume that the key (‘Hello’) pro­duces a hash of 12356 (hashes are much longer—large enough to

en­sure that there is lit­tle col­li­sion). So if I were to map it to an in­stance, I would use the fol­low­ing com­mand:

12356 % 10 = 6

This means that the data for the key ‘Hello’ would be stored in the in­stance num­ber 6.

Now let us add a cou­ple of in­stances, tak­ing the count of in­stances to 12. Where would the key ‘Hello’ map to now?

12356 % 12 = 8

Be­cause the client will look for the key ‘Hello’ in the This is why we use con­sis­tent hash­ing.

So what is con­sis­tent hash­ing? Sim­ply put, it is a way of en­sur­ing that keys map con­sis­tently to the same cache in­stance even when the cache in­stances are added or re­moved. The caching func­tion does its best to make this sce­nario pos­si­ble. But there will be some cache misses.

How does con­sis­tent hash­ing achieve that? Sim­ple! It hashes the iden­ti­fier for the caches (typ­i­cally IP ad­dresses and port com­bi­na­tions) with the same hash­ing func­tion used to hash the key, and then ap­plies a clever trick to map the keys to the in­stances.

As­sume that the hash­ing func­tion can only cre­ate hashes in the range - 100 to + 100 ( it would be a pretty use­less hash­ing func­tion if it had only 201 pos­si­ble val­ues, but for the sake of demon­stra­tion, let us work with it). Now as­sume that the hash val­ues were the dial of a clock ( ar­ranged in a cir­cle just like they are on a clock). So the val­ues would start at - 100 at the top and in­crease clock­wise un­til they reach + 100 at full cir­cle ( see Fig­ure 1).

Adding and re­mov­ing in­stances

Now, let’s hash the in­stances and plot the re­sult­ing hash (which will be in the range -100 to + 100) on the dial. Let us as­sume the in­stances are at points A, B and C as shown in Fig­ure 2.

Now, to map keys to an in­stance, move clock­wise and as­sign each key to the near­est in­stance that comes af­ter the key. So, in this case, -70 and -30 will go to B, +10 will go to C and +50 will go to A. What hap­pens if an in­stance is re­moved? Let us as­sume that the in­stance B is re­moved. Then the val­ues - 70, - 30 and + 10 will go to C and the oth­ers will re­main as is. Even af­ter re­mov­ing an in­stance, only two keys are re- mapped. The oth­ers will con­tinue to be served from the same cache in­stance.

Now let us add an­other in­stance ( see Fig­ure 4). Say we added D at the lo­ca­tion shown in the di­a­gram. What would hap­pen is that - 70, - 30 and + 10 will still map to C, + 50 would map to D, and A would have noth­ing mapped to it. Again, you will see that the cache has not been dis­rupted too much in this case.

Con­sis­tent hash­ing is now in­cluded in most of the pop­u­lar Mem­cached clients. For ex­am­ple, Mem­cached Java Client, a pop­u­lar Java client for Mem­cached, has sup­port for it.