Wake County Voter Analysis Using FSharp, AzureML, and R

One of the real strengths of FSharp its ability to plow through and transform data in a very intuitive way, I was recently looking at Wake Country Voter Data found here to do some basic voter analysis. My first thought was to download the data into R Studio. Easy? Not really. The data is available as a ginormous Excel spreadsheet of database of about 154 MB in size. I wanted to slim the dataset down and make it a .csv for easy import into R but using Excel to export the data as a .csv kept screwing up the formatting and importing it directly into R Studio from Excel resulting in out of memory crashes. Also, the results of the different election dates were not consistent –> sometimes null, sometimes not. I managed to get the data into R Studio without a crash and wrote a function of either voted “1” or not “0” for each election

However, every time I tried to run it, the IDE would crash with an out of memory issue.

Stepping back, I decided to transform the data in Visual Studio using FSharp. I created a sample from the ginormous excel spreadsheet and then imported the data using a type provider. No memory crashes!

The really great thing is that I could write and then dispose of each line so I could do it without any crashes. Once the data was into a a .csv (10% the size of Excel), I could then import it into R Studio without a problem. It is a common lesson but really shows that using the right tool for the job saves tons of headaches.

I knew from a previous analysis of voter data that the #1 determinate of a person from wake county voting in a off-cycle election was their age:

So then in R, I created a decision tree for just age to see what the split was: