Search results

Common Crawl - Blog - Video Tutorial: MapReduce for the Masses

Video Tutorial: MapReduce for the Masses. Learn how you can harness the power of MapReduce data analysis against the Common Crawl dataset with nothing more than five minutes of your time, a bit of local configuration, and 25 cents.

Common Crawl - Blog - Video: Gil Elbaz at Web 2.0 Summit 2011

Video: Gil Elbaz at Web 2.0 Summit 2011. Hear Common Crawl founder discuss how data accessibility is crucial to increasing rates of innovation as well as give ideas on how to facilitate increased access to data. Common Crawl Foundation.

Common Crawl - Blog - Video: This Week in Startups - Gil Elbaz and Nova Spivack

Video: This Week in Startups - Gil Elbaz and Nova Spivack. Nova and Gil, in discussion with host Jason Calacanis, explore in depth what Common Crawl is all about and how it fits into the larger picture of online search and indexing.

Common Crawl - Blog - Common Crawl's Brand Spanking New Video and First Ever Code Contest!

Common Crawl's Brand Spanking New Video and First Ever Code Contest! At Common Crawl we've been busy recently!

Common Crawl - Blog - Startup Profile: SwiftKey’s Head Data Scientist on the Value of Common Crawl’s Open Data

Today we are following it up with a great video featuring Sebastian talking about why crawl data is valuable, his research, and why open data is important. Common Crawl Foundation. Common Crawl - Open Source Web Crawling data‍.

Common Crawl - Blog - Still time to participate in the Common Crawl code contest

If you are looking for inspiration, you can check out. our video. or the. Inspiration and Ideas page. of our wiki. There is lots of helpful information to on our wiki to help you get started including an. Amazon Machine Image. and a. quick start guide.

Common Crawl - Blog - Twelve steps to running your Ruby code across five billion web pages

Big thanks again to Ben Nagy for putting the code together, and if you're interested in understanding Hadoop and Elastic MapReduce in more detail, I created a. video training session. that might be helpful.

Common Crawl - Team - Pete Warden

You can. find him on Twitter as @petewarden. , he blogs at. petewarden.com. , and has. a series of videos available on YouTube. The Data. Overview. Web Graphs. Latest Crawl. Resources. Get Started. Blog. Examples. Use Cases. CCBot. Infra Status. FAQ.

Common Crawl - Use Cases

Videos. Do you like what you see here? If you need further answers don't hesitate to get in touch. Get in touch. CC Catalog: Leveraging Open Data and Open APIs. sclachar. 87 Million Domains PageRank. Aysun Akarsu.

Common Crawl - Blog - Web Image Size Prediction for Efficient Focused Image Crawling

The data of interest include all images and videos from all web pages and metadata extracted from the surrounding HTML elements. To complete the task, we used 50 Amazon EMR medium instances, resulting in 951GB of data in gzip format.