How we do this

Explore this sample project

We've uploaded our file-system dataset containing the original logs collected by WT1. The dataset is partitioned by day. Each line of the dataset is one page view from the website by one person. The "location" column contains the URL of the page visited.

In the second step, we sync the data on a PostgreSQL database (so we can write SQL, and so our calculations and graphs run faster) . Note that we also un-partitionning the dataset (it's easier for this project).

After this, we join those two datasets, and clean them to build a map on them (by creating one observation per geographic point). In the other branch, we just group our data by day so we can get the number of visitors per day.