< Back to Blog
February 27, 2015

5 Good Reads in Big Open Data: February 27 2015

Hadoop is the Glue for Big Data - via StreetWise Journal: Startups trying to build a successful big data infrastructure should "welcome...and be protective" of open source software like Hadoop. The future and innovation of Big Data depends on it.
Common Crawl Foundation
Common Crawl Foundation
Common Crawl - Open Source Web Crawling data‍
  1. Hadoop is the Glue for Big Data - via StreetWise Journal: Startups trying to build a successful big data infrastructure should "welcome...and be protective" of open source software like Hadoop. The future and innovation of Big Data depends on it.
  2. Topic Models: Past, Present Future -via O'Reilly Data Show Podcast:
  3. You might analyze a bunch of New York Times articles for example, and there’ll be an article about sports and business, and you get a representation of that article that says this is an article and it’s about sports and business. Of course, the ideas of sports and business were also discovered by the algorithm, but that representation, it turns out, is also useful for prediction. My understanding when I speak to people at different startup companies and other more established companies is that a lot of technology companies are using topic modeling to generate this representation of documents in terms of the discovered topics, and then using that representation in other algorithms for things like classification or other things.
  4. Border disputes on Europe's Right To Be Forgotten - via Slate: Is the angle of debate (disruptors vs. regulators) wrong? Should we be thinking of more custom solutions to this global issue?
  5. Flashgraph can analyze massive graphs to the proven tune of 129 billion edges- via the Common Crawl Blog (Flashgraph on GitHub):
  6. You may ask why we need another graph processing framework while we already have quite a few… FlashGraph seeks performance, capacity, flexibility and ease of programming at the moment when it was created. We hope FlashGraph can have performance comparable to the state-of-art in-memory graph engines while scaling to graphs with hundreds of billions of edges or even trillions of edges. We also hope that FlashGraph can express varieties of algorithms in FlashGraph and hide the complexity of accessing data on SSDs and parallelizing graph algorithms.
  7. The future of the internet is NOT all decided by net neutrality - via The Atlantic: A wonderfully curated net neutrality reading list, including one article where Justice Antonin Scalia tells us the Internet is a pizzeria (he's right)
  8. Follow us @CommonCrawl on Twitter for the latest in Big Open Data
Errata
No items found.
This release was authored by:
No items found.