Splunk is sponsoring the prizes and has donated access to a Splunk server containing the entire WordPress competition dataset. When you register for the competition, the team leader will receive an email with a Splunk login. Using Splunk is not a requirement for the competition, but I encourage you and your team to check it out as you begin exploring the data.

In addition to sponsoring the prizes for the predicitive modeling competition, Splunk is also awarding the 5K Splunk Innovation Prize for the most innovative use of Splunk in the competition. Submissions for the companion prize will open at the end of the competition in September.

Jack

Disclaimer: once you signup for anything Splunk related their sales people will hassle you like there is no tomorrow.

Just a little bit of nitpicking: The "over 16% of all domains on the web" part is just plain wrong 🙂
The linked article mentions 14% of the top 1 million domains and 22 out of ever 100 new domains in the US. But that is nowhere even close to 16% of "all domains on the web"

Sorry, that wasn't quite the best link since that data is a little stale. Our (Automattic/WordPress.com) internal tracking does show WordPress powering >16% of the web. We are now tracking more than just the top million domains.