Another Octopress blog about programming and infrastructure.

NY Taxi Data Visualized

Jun 26th, 2014

Recently a massive dataset of NYC Taxi Data was made public. There are torrents available but at 19gb the data can be quite unwieldy to manage on a home machine. /r/BigQuery have uploaded the dataset to Google’s BigQuery service.

BQ provides a simple way to get insights out of this dataset without tearing through your internet usage or waiting for your home machine to query 173 million records. For example on reddit they have already discovered some anonymization issues.