Startup Profile: SwiftKey’s Head Data Scientist on the Value of Common Crawl’s Open Data

Sebastian Spiegler is the head of the data team and SwiftKey and a volunteer at Common Crawl. Yesterday we posted Sebastian’s statistical analysis of the 2012 Common Crawl corpus. Today we are following it up with a great video featuring Sebastian talking about why crawl data is valuable, his research, and why open data is important.

The video is an excellent illustration of how startups can benefit from Common Crawl data and we hope that it inspires other startups to use our data!

 

 

Video Tutorial: MapReduce for the Masses

Learn how you can harness the power of MapReduce data analysis against the Common Crawl dataset with nothing more than five minutes of your time, a bit of local configuration, and 25 cents. Check out the full blog post where this video originally appeared.

Video: This Week in Startups – Gil Elbaz and Nova Spivack

Founder Gil Elbaz and Board Member Nova Spivack appeared on This Week in Startups on January 10, 2012. Nova and Gil, in discussion with host Jason Calacanis, explore in depth what Common Crawl is all about and how it fits into the larger picture of online search and indexing. Underlying their conversation is an exploration of how Common Crawl’s open crawl of the web is a powerful asset for educators, researchers, and entrepreneurs.