Scalability

From Epowiki

Scalability is the ability to keep solving a problem as the size
of the problem increases.

Scale is measured relative to your requirements. As long as you can
scale enough to solve your problem then you have scale. If you can handle
the number of objects and events required for your application then
you can scale. It doesn't really matter what the numbers are.

Scaling often creates a difference in kind for potential solutions.
The solution you need to handle a small problem is not the same as
you need to handle a large problem. If you incrementally try to
evolve one into the other you can be in for a rude surprise, because
it won't work as you pass through different points of discontinuity.

Scale is not language or framework specific. It is a matter of approach
and design.

The Two Classes of How to Handle Scalability

The two different classes lead to solutions using completely different

lity problem you have a fixed set of resources

yet you have to deal with ever increasing loads.

For example, if you are an embedded system like a router or switch, you
are not likely ever to get more CPU, more RAM, more disk, or a faster network.
Yet you will be asked to handle:

more and more functionality in new upgrade images

more and more load from clients

The techniques for dealing with loads in this scenario are far different
than load when you can expand your resources.

Scalability Under Expandable Resources

In this class of scalability problems you have the ability to add
more resources to handle more work, in general this is called horizontal scaling.

The new era of cheap yet powerful computers has made horizontal scaling possible for
virtually anyone. Many companies can afford
to keep grid of hundreds of machines to solve problems.

This is the approach google has taken to handle their search systems,
for example, and it's a very different approach from a fixed resource
approach. In a fixed resource approach we would be squeezing every
cycle of performance of the resources, we would be spending a lot of
time on developing new approaches and tuning existing code to fit
the exact problem.

When resources are available, and your approach is right, you can
just add more machines. You start to figure out ways to solve your problem
assuming horizontal scaling.

For example, terrascale (http://terrascale.net/) has an amazing
storage grid called Terragrid, that allows you to scale up
by adding incrementally adding commodity machines. With the availability
of 10Gb ethernet interfaces these approaches become quite powerful.

Of course, your approach has to be right. If you select an architecture with
single points of serialization then you won't be able to scale by
adding more machines.

Examples of Dealing with Scale

Here are a few examples of how different people have dealt with scale.

Databases

C-Store: A Column-oriented DBMS - is column-oriented (values for a column are stored contiguously) instead of row-oriented like most databases. It is optimized for reads. It is designed for sparse table structures and compresses data.

Bigtable: A Distributed Storage System for Structured Data - Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance.

One thing to think about when building such a system from a large number of hard disks is that disks will fail, all the time. The argument is fairly convincing:

Suppose each disk has a MTBF (mean time before failure) of 500,000 hours. That means that the average disk is expected to have a failure about every 57 years. Sounds good, right? Now, suppose you have 1000 disks. How long before the first one fails? Chances, are, not 57 years. If you assume that the failures are spread out evenly across time, a 1000-disk system will have a failure every 500 hours, or about every 3 weeks!

WHAT is going to be done (database, file storage?)

HOW will it be accessed? (One large file, many smaller files)

WHEN will it be accessed? (During business hours, distributed over the day?)

AVERAGE TRANSFERS - will the whole schmear come over, selected parts?

SECURITY a concern? (Sensitive data, protected network)

BACKUP - a petabyte of tape storage is expensive, and takes quite a while to do.

POWER - do you have enough?

COOLING - ditto

SPACE - ditto

Storage

Petabox claims 40 watts per terrabyte. That is pretty low, if you are going to try to come up with your own solution with off the shelf parts it'll be hard to match that. If they can't pay for 40 watts per terrabyte for a petabyte maybe they should reconsider that they need the petabyte for now.