From an InfoQ interview with project lead Doug Cutting, it appears Hadoop, an open source distributed computing platform, is making good progress towards their 1.0 release. They've successfully reached a 1000 node cluster size, improved file system integrity, and jacked performance by 20x in the last year.

How they are making progress could be a good model for anyone:

The speedup has been an aggregation of our work in the past few years, and has been accomplished mostly by trial-and-error. We get things running smoothly on a cluster of a given size, then double the size of the cluster and see what breaks. We aim for performance to scale linearly as you increase the cluster size. We learn from this process and then increase the cluster size again. Each time you increase the cluster size reliability becomes a bigger challenge since the number and kind of failures increase.

It 's tempting to say just jump to the end game, don't bother with all those errors and trials, but there's a lot of learning and experience that must be earned on the way to scaling anything.