In a series of articles, I’ll describe the motivation and background as well as the engineering tools and practices we developed over the past couple of years to attack one of these once-in-a-lifetime projects that get engineers truly excited: building a query optimizer from scratch. All database vendors at some point in time have to redesign one or the other large component in their system. When it comes to the query optimizer, all of them have refurbished/rewritten/remodeled over the past 15 years. The first really big splash in this category was made by Microsoft with its rewrite of the entire query processor for the 7.0 release of SQL Server in the years of 1994-1998. This initiative was instrumental in taking the product from negligible revenue to being a 1 billion dollar a year business in only 2 major releases. Others followed suit, but as far as I can tell none was similarly radical — most were more a matter of refurbishing existing structures. If you’ve been part of any such initiative at, say Oracle, I’d really like to buy you coffee and get some insights in the software engineering aspects of your project: pitfalls, ambitions, team dynamics, etc.

Anyways, for a startup like Greenplum it’s a much dicier decision to rebuild and entire component and, suffice it to say a lot of convincing was needed before upper management gave the green light to go ahead and hire a team of engineers, design a new optimizer, and start coding. Now that the product is shaping up and we’re on the home stretch it’s time to review some of the lessons learned! What’s with the whale you ask? You’ll see.