Friday, October 29, 2010

Anything which needs agreement will eventually fail at scale

I’ve been looking at what on earth is going on at Amazon – actually a massively large organization these days, with so much software that has to scale so radically that they simply could not adopt ANY conventional wisdom. They had the massive front end code base, the massive back end code base, and every holiday season things nearly fell off the cliff. The simply could not scale large enough, fast enough, and every attempt to scale brought decreasing increments of value relative to the additional cost. So Amazon changed its architecture. There is no such thing, really, as a database anymore; there are only services. You want data, you want something done, you ask a service to do it.

In order for this to work (=scale) services had to be small and independent. In fact CTO Werner Vogels says that anything which needs agreement will eventually fail at scale. Thus services run locally. There is no central control to fail (much like the Internet). And guess what. He found that each service could have its own team. This team does it all – customer interaction, deciding what to develop, development, deployment, operations, support. I mean everything. No handoffs. And the size of each team is no more people than can be fed with 2 pizzas.

Now there are many, many 2-pizzas teams, each completely owning a service, cradle to grave. But note – there are no standards for configuration management, IDE’s, languages, nothing. Teams can do whatever they want, although some supported tools are easier to use, so they are more common. The idea that anything which needs agreement will eventually fail at scale applies to people as well as software.

The not-so-amazing thing is that this works. Amazingly well. So Pete, while I agree that there is a size beyond which you have to have to have standards outside the team, I also see that there is an even bigger size beyond which you simply no longer have that luxury. Anything which needs agreement will eventually fail at scale.