< Back to Blog
March 20, 2015

5 Good Reads in Big Open Data: March 20 2015

Startup Orbital Insight uses deep learning and finds financially useful information in aerial imagery - via MIT Technology Review: “To predict retail sales based on retailers’ parking lots, humans at Orbital Insights use Google Street View images to pinpoint the exact location of the stores’ entrances. Satellite imagery is acquired from a number of commercial suppliers, some of it refreshed daily. Software then monitors the density of cars and the frequency with which they enter the lots.”
Common Crawl Foundation
Common Crawl Foundation
Common Crawl - Open Source Web Crawling data‍
  1. Startup Orbital Insight uses deep learning and finds financially useful information in aerial imagery– via MIT Technology Review:
  2. To predict retail sales based on retailers’ parking lots, humans at Orbital Insights use Google Street View images to pinpoint the exact location of the stores’ entrances. Satellite imagery is acquired from a number of commercial suppliers, some of it refreshed daily. Software then monitors the density of cars and the frequency with which they enter the lots.
  3. Crawford’s company can also use shadows in a city to gather information on rates of construction, especially in secretive places like China. Satellite images could also predict oil yields before they’re officially reported because it’s possible to see how much crude oil is in a container from the height of its lid. Scanning the extent and effects of deforestation would be useful to both investors and environmental groups.
  4. Goodbye to Google Code -via eweek.com: Google is closing it’s open source project. With hosts like GitHub and BitBucket, users have migrated and Google Code is no longer needed.
  5. Trends in Big Data Vs Hadoop Vs Business Intelligence– via Hadoop 360: Visualizing how interest has changed over the years
Screen Shot 2015-03-19 at 12.26.02 PM
Image via Hadoop360
  1. Analysis of Common Crawl PDF metadata via PDFinfo.net
Screen Shot 2015-03-19 at 2.49.16 PM
  1. Open Data should be the new Open Source– via Computerworld:
  2. But the lack of open data still seriously holds innovation back, and as data becomes more critical, the problem becomes worse.
  3. For example, think about how hard it is for innovative predictive analytics companies to get off the ground. It’s not that they don’t have the software; it’s that they don’t have the data. There are plenty of excellent open source projects to build on top of (Sci-Py, R, etc.). But the lack of usable data is a huge issue when it comes to testing and training the algorithms in any domain.
  4. The same exact thing would be true when an entrepreneur starts an e-commerce company. A high quality search engine is crucial in e-commerce and there plenty of great tools to build the search infrastructure such as Lucene, but no good datasets to test and train the ranking and relevance algorithms.
  5. Which is to say this: There are smart, creative data scientists out there who don’t have the tools to do valuable work.

Follow us @CommonCrawl on Twitter for the latest in Big Open Data. If you value Open Data, please make a donation to the Common Crawl Foundation.

Errata
No items found.
This release was authored by:
No items found.