Wikipedia Enjoys Phenomenal GrowthThanks to MySQL

"The notion persists within many traditional enterprises that once you reach a certain level of application importance, it is necessary to transition to big, expensive boxes running big, expensive databases. However, free-thinking members of their IT staffs are beginning to ask the question: 'What can we learn from Google, Yahoo, and Wikipedia on how to scale for high growth?'"

Stephen O'Grady, Principal AnalystRed Monk

Introduction

Wikipedia is the multilingual, Web-based, free encyclopedia that is produced collaboratively by volunteers. According to Alexa Traffic Rankings, Wikipedia consistently ranks in the Top Ten most-visited Web sites in the world. It hosts over 15 million articles in more than 260 languages. Every day, tens of millions of visitors learn more about their world — making nearly a half-million edits and creating thousands of new entries.

Their 'Growing' Challenge

Wikipedia's growth statistics are simply amazing. The organization has faced exponential growth on many fronts since its introduction in 2001. These include:

Monthly visitor growth from less than 50,000 to over 350 million

Content growth from less than 100 articles to over 15 million

Contributor growth from less than 100 to over 300,000

Wikipedia expects the growth in content, contributor and user-base to continue in all directions - and needs a computing infrastructure that will keep the pace.

Their Scale-Out Solution

This phenomenal growth has put constant technical pressure on the performance and scalability of the system. Wikipedia is based on the LAMP stack (Linux, Apache, MySQL & PHP) and has grown from initially employing a single shared server to now being a Top Ten site, with more than 20 replicated database servers delivering up-to-date content to visitors. Additionally, lightweight MySQL instances are spread out on application servers as a distributed archive solution.

Wikipedia relies upon MySQL replication to scale-out their database infrastructure and accommodate more visitors, more articles and more contributors. This architecture also allows them to save significantly on hardware costs. Since they add new servers only on an incremental, as-needed basis, they can delay their new hardware purchases until more powerful machines drop to lower, commodity prices.