Data Visualization

Final VersionCode
—
I am fascinated by the MTA, so for our data visualization assignment, I started with the MTA’s developer resources. In addition to real time train data, they publish a weekly accounting of turnstile traffic at all stations, given in 4 hour samples, with each file comprising a week. I downloaded the most recent week to check it out, and go to work writing a program to clean up the data. It was a mess.

My first step was writing a callback function to clean the data. I figured it’d be better to do this in python, but this is a javascript class so I figured I’d go for it… The original data set had sampling down to the individual turnstile, and I wanted to simplify it so that it just had data on each station overall. Here was my original code:

for (var r = 1; r < lastWeek.getRowCount(); r++){
let thisRow = lastWeek.getRow(r);
let thisStation = thisRow.get("STATION");
let thisDate = thisRow.get("DATE");
let thisTime = thisRow.get("TIME");
let thisEntries = thisRow.get("ENTRIES");
let thisExits = thisRow.get("EXITS");
// if the combo of station, date, and time exists in the new spreadsheet
// add this rows info to the corresponding row
// if that combo of all three does not exist,
// add a new row
if(table.findRow(thisStation)){ //at the very least this station should exist
for (let n = 1; n < table.getRowCount(); n++){
let myRow = table.getRow(r);
// THIS WAS A BUG I LATER FIXED, all these variables should be 'myRow'. ...
let myStation = thisRow.get("STATION");
let myDate = thisRow.get("DATE");
let myTime = thisRow.get("TIME");
if( myStation == thisStation && thisDate == myDate && thisTime == myTime){
let newEntries = myRow.getNum(n,"entries") + thisEntries;
let newExits = myRow.getNum(n, "exits") + thisExits;
myRow.setNum(n, 'entries',newEntries );
myRow.setNum(n, 'exits', newExits);
break;
}
}
} else
table.addRow(thisRow); //add a new row
}
saveTable(table, "cleanedUpTurnstile.csv")
}
This only partially accounted for my data (and now I know had a lot of bugs).. but whatever, I couldn't even get the initial table to load. The original file had about ~65k rows, and I immediately ran into issues with p5 parsing the file into a table. I tried deleting rows, and still no luck. I did all kinds of silly things to try and get the initial table to just load. External files, local files, test files, preload, setup, callbacks it didn't seem to matter.

I managed to get it working, but it was finicky. So I decided to avoid p5 in favor of a library called papa parse. The demo worked well, but it took some time get my code sorted. I had to use jquery to select my file (as shown in this example), and it worked! I had some trouble in my code getting the data cleaned up... and I was still hung up on p5 table not working.I tried this example locally and it worked... I had previously tried dropping a callback and doing everything in setup, but it didn't work. But then I tried swapping my file into Allison's example and it worked!!! dakslfjadkslfjlads :table-flip:

So now it was time to figure out my code for cleaning the data. I once again ran into hurdles with p5 tables, so I switched back to papa parse, and YAY NOW I GOT MOST SH*T WORKING! Basically the library turns your csv into an array of json objects that represent rows, with keys being the column headers.

for (var r = 0; r < stuff.length; r++) //iterate through the rows of our incoming data
{
let thisRow = stuff[r];
let thisStation = stuff[r].STATION;
let thisDate = stuff[r].DATE;
let thisTime = stuff[r].TIME;
let thisEntries = parseInt(stuff[r].ENTRIES);
//let thisExits = parseInt(stuff[r].EXITS);
// if the combo of station, date, and time exists in the new spreadsheet
// add this rows info to the corresponding row
// if that combo of all three does not exist,
// add a new row
let addARow = true; //flag for whether or not we push the row.
for (let n = 0; n < table.length-1; n++) //iterate through our new spreadsheet...
{
let myStation = table[n].STATION;
let myDate = table[n].DATE;
let myTime = table[n].TIME;
let myEntries = parseInt(table[n].ENTRIES);
//let myExits = parseInt(table[n].EXITS);
if( myStation === thisStation && thisDate === myDate && thisTime === myTime) //if this row in the new spreadsheet is the same as the old one
{
//todo sum the entries and exits
table[n].ENTRIES = myEntries + thisEntries;
//table[n].EXITS = myExits + thisExits;
addARow = false;
break;
}
}
if(addARow) //if we didn't update any existing rows
{
console.log("adding row");
table.push(thisRow); //add a new row
}
}
console.log(table);
csv = Papa.unparse(table);
console.log(csv);
}
I had problems with the exits column so I omitted actually updating the count of exits in this code... From here, I just copied the output from the console and I had a somewhat cleaned up CSV. I loaded it into a spreadsheet and started to comb through the data to look for irregularities and do additional transformations. I wanted to change my entries info from a cumulative count to tracking how many people were entering in a given 4 hour window. The MTA documentation wasn't the best, and I still had some peculiarities in my data... I decided to move on.

I grabbed the citi bike trip data and started messing around with it. I wrote a function to remove rows with lat longs of 0, and then began playing around with the visualization. My first test, I plotted the lat/lngs of start and end of the trip to lines. Then I messed around with using the length of the trip as a sort of "decay" to remove lines from the screen. Then I ended up using the decay for stroke as well. Here's where my first test ended up.

Then I decided to create a fresh sketch and try to "zoom" in on the networks. I wrestled with this for a bit, but landed on a method of defining min/max lat and longs, and then regenerating the drawing. This worked fine, though slow. Then I went to make a reset button and ran into so many problems. For some reason, I couldn't get my mouseReleased functions to stop triggering when I hit reset, even when I tried to create conditions to avoid it. Anyways here's where I landed