Simple algorithm

I want to share one of my first experiences with Redis sharding. I was not directly invovled in the project but worked closely with the team. The system was tracking phone calls so we sharded on the last integer of the number dialed. This gave us very uniform 10 shards.

In this case we are separating keys by hosts but we could just as easily specify different namespaces, ports or DB values. Since our goal is scalability, we do not want to increase the size of our keys (and consume more RAM) by adding namespace to it.

Here we have an API that recevies HTTP requests with 2 params phone_from and phone_to:

# app/controllers/phone_controller.rbclassPhoneController<ApplicationControllerdefcreatePhoneTracker.new(phone_from: params[:phone_from],phone_to: params[:phone_to]).performendend# app/services/phone_tracker.rbclassPhoneTrackerdefinitialize(phone_from:,phone_to:)@phone_from=phone_from@phone_to=phone_toenddefperform# grab the last digit of the phone numbershard_number=@phone_to.last# get the Redis connection to the right shardredis="REDIS#{shard_number}".constantize# each value is a sorted set with the score being the number of calls fromredis.zincrby(@phone_to,1,@phone_from)endend

More complex example

Alas, in many situations we do not have a number that we can split into 10 buckets. What if we are tracking unique visitors to our website using combination of IP and user agent? Here is a preivous post where I used murmurhash. To split data into 4 shards we simply % 4 to get the right Redis connection.

Rebalancing data after adding / removing nodes

What if later we need to add more Redis servers (going from 4 to 6)? We can change our code to % 6 but we also need to move some of the records from the current 4 Redis servers to the new ones? Full confession - I have not implemented this in real production system, these are just general thoughts.

classRedisResharddefperform[1..4].eachdo|shard|redis="REDIS#{shard}".constantize# this will grab ALL keys which is NOT what we want to do in real lifekeys=redis.keys("*")keys.eachdo|key|# check if key needs to be moved to new shardnew_shard=key%6ifshard!=new_shardnew_redis="REDIS#{new_shard}".constantizevalue=redis.get(key)# check if data is present in the new shard alreadynew_value=new_redis.get(key)new_redis.set(key,value+new_value)# remove data from the old shardredis.del(key)endendendendend

The challenge is to run this on a real production system while data is actively used. Can’t say I am looking forward to trying this for real if I ever have to ;-). Read the links below for better ideas on paritioning and clustering.