Tuesday, September 14, 2010

Strategies to address scalability

There are some very fundamental scalability strategies which i will try to list here. These may have detailed and varied implementations.

Horizontal scalability - distribute the load : A very basic strategy for scalability is the ability to distribute the load or the requests across multiple processing units. Your code should be modular enough to allow horizontal scalability. Also the less information is shared among multiple requests, the easier it would be for you to scale horizontally. You should aim for a shared-nothing architecture

Sharding or partitioning is another way of scaling horizontally. The main idea here is to spread the load across many components by routing an individual request to a component that owns the data specific to that request. So you divide the complete data across multiple components and spread the requests among the components. There are two ways of creating shards:

Vertical partitioning : Spread the load across multiple processing units by distributing the functionalities across the components. So separate functions are being handled by different processing units. Column based databases are a very good example of vertical partitioning.

Horizontal Partitioning : Spread a single type of data element across multiple processing units. It is also referred to as hashing. Hashing based on userid to spread users across multiple datapoints can be an example of horizontal partitioning.

Parallelization is another way of achieving scalability. Parallelization can be achieved by working on the same task in parallel on multiple processing units. For example a web page having
multiple components can process and fetch the components in parallel from multiple processing units.

Queueing or batch processing : Queueing and processing data in batches also provide a solution to scalability. Because the overhead of an operation is ammortized across multiple requests.< /li>

Relaxation of data constraints : Many different techniques and trade-offs with regards to how fast data can be accessed/processed or stored fall in this strategy. Caching, giving away fore
ign keys and replication or denormalization of data are some of the examples of this strategy.

Ofcourse scalability cannot be addressed using only one of these strategies. The solution for scalability would depend on how the data is held and what compromises are we able to make with it. The solution might be a combination of a few of these strategies and more.