Fabian Buentello

Nba Machine Learning Chapter 3

17 minute read

Intro

The following command will sync your repo with mine if you’re having issues:

$ git checkout startChapter3 -f

I’ll be honest, I restructured my data at least four times throughout this project. I’m going to show you the final version of my data structure. The structure of the data in the database will be the following:

Let’s move into our buildData() function. All the following work will be done in .on("data", fun... Since MongoDB deals with objects, we are going to make one giant object in our program called DB_OBJ. Let’s put DB_OBJ with our modules. So add the following line to your Modules Section:

varDB_OBJ={};

Next, jump inside of our .on("data") method inside of buildData(). Around this time of development, I was using a debugger like node-inspector. It’ll be too much of a hassle to bring the debugging aspect into this tutorial. So I’m just going to show you how the data looks like at this point of the program.

This data is missing some very important information that we need to know. We don’t know what season this data belongs to, or what stat folder it came from(totals, advanced). So we need to parse out the stat type and season from the file path, time for some good old fashion regex! I took it upon myself to find out the regex formula we are going to use.

If you’re thinking to yourself, “I’m sure this section irritates Fabian a lot, we all know he hates long lines and those are some long ass lines!” Yea, you’re right and we haven’t even added the else portion! Let’s get rid of all these data.<something> to help shorten the lines. Inside of .on("data"), erase everything and replace it with this:

// 1varname=data.Player,yr=data.Season,stat=data.statType;// Does Player exist?if(!DB_OBJ.hasOwnProperty(name)){// Player doesn't exist// 2vartmpPlayer={Player:name,Pos:data.Pos,Seasons:{}};DB_OBJ[name]=tmpPlayer;DB_OBJ[name].Seasons[yr]={};}elseif(!DB_OBJ[name].Seasons.hasOwnProperty(yr)){// Player Exists, Season doesnt existDB_OBJ[name].Seasons[yr]={};}// 3DB_OBJ[name].Seasons[yr][stat]={};DB_OBJ[name].Seasons[yr][stat]=data;

Part 1 - We are reducing the variables to a single word to reduce the length of the line.

Yeah I know, I know. This data is starting to be really hard to examine, I think it’s time we put it in a file.

In your Modules Section, add the following line:

varjsonfile=require('jsonfile');

and replace async.each() callback to this:( look at the next step if you’re confused )

function(err){console.log('***DONE BUILDING DATA***');jsonfile.writeFile('./data/outputFile.json',DB_OBJ,{spaces:4},function(err){console.error(err);});});

Your entire buildData() function should look like this:

varbuildData=function(paths){async.each(paths,function(path,_aCallback){// Create File StreamvarinputStream=fs.createReadStream(path);// Read in CSV filefast_csv.fromStream(inputStream,{headers:true,ignoreEmpty:true}).transform(function(data){varparsedPath=path.match(/(advanced|totals)|(\d{4})/g);data.statType=parsedPath[0];data.Season=parsedPath[1];returndata;}).on("data",function(data){varname=data.Player,yr=data.Season,stat=data.statType;// Does Player exist?if(!DB_OBJ.hasOwnProperty(name)){// Player doesn't exist// helps us for the final data structure.vartmpPlayer={Player:name,Pos:data.Pos,Seasons:{}};DB_OBJ[name]=tmpPlayer;DB_OBJ[name].Seasons[yr]={};}elseif(!DB_OBJ[name].Seasons.hasOwnProperty(yr)){// Player Exists, Season doesnt existDB_OBJ[name].Seasons[yr]={};}// add the stats to the player object.DB_OBJ[name].Seasons[yr][stat]={};DB_OBJ[name].Seasons[yr][stat]=data;}).on("end",function(){console.log("done");console.log(DB_OBJ);_aCallback();});},function(err){console.log('*****DONE BUILDING DATA*****');jsonfile.writeFile('./data/outputFile.json',DB_OBJ,{spaces:4},function(err){console.error(err);});});};

Let’s run it:

$ node buildNBA_Data.js =BUILD TEST

It should create a file called outputFile.json that should look like this, if not don’t worry, a code check up is soon:

OK let’s remove our console.log(DB_OBJ) add a few more years to our =TEST. Inside of your Main Function change endYr from 1982 to 1990:

// If Test is set, only get a few yearsvarendYr=(isTest)?1990:2016;

Let’s run it:

$ node buildNBA_Data.js =BUILD TEST

Hopefully your outputFile.json looks like this:

OK, so we’re not done cleaning our data yet. I know were about to get to the Mongo portion of the tutorial, but it’ll be fast, I promise. Here’s our code checkup for the second section:

Mongo

We’re going to pretend that all of us know what issues are going to arise in the future from how our data is currently. For one, it’s not even in the final structure that we wanted. I’m going to go ahead and list the issues that we have with our data:

Not in the final structure.

* at the end of names(HoF players).

All values are strings.

The first is self-explanatory. The second one causes problems when we want to search up certain players in the database. The third, I didn’t pay too much attention to it at first… Until I had to apply math in the Machine Learning portion of this project. I tried converting them to floats while in Python. After 10 minutes of doing that, I figured it would be much easier for them to already be a float type in the database.

This module will have a few properties that will help keep our code together. The first property, filter, will take care of converting all our stats(strings) into floats and remove unwanted properties like Rk, 0, matches. The next property, clean, will change our structure from:

If you rarely work with modules, I recommend reading that article I mentioned above, even if you just need a quick refresher. The reason why we are making DB_OBJ a module is so that we have all the data and data manipulation in one location. As of right now, we are filtering the data in the .transform() and we would be cleaning the data right before our jsonfile.writeFile() line.

Let’s start off with filter since we have the majority of the code for it already. We will be calling our filtering method in our .transform() method. Inside of buildData(), replace .transform() with:

Those first two lines seem familiar, what about the rest? The next three are pretty simple to explain. If the player has an * in his name, we remove it. The next two lines we delete the rk and 0 property. The last section may look a bit confusing, but its not. We are going through each property and checking if the string could be a number. This stack overflow answer does a great job explaining it.

Let’s run it and see if the stats are no longer strings and we’ve removed the * from Kareem Abdul-Jabbar:

$ node buildNBA_Data.js =BUILD TEST

Did your stats turn into numbers like mine? if not, its ok, a code checkup is very soon!:

Now let’s get to cleaning. Some of you might have already figured out that we are going to print out those functions inside of outputFile.json(depending jsonfile version). Does that mean we’re going to be printing those out to? We’re going to delete them before it gets to that point, so make your data.clean look like this:

Part 1 - Since we are using the _.map() function, it’s going to return all the values in our data object as an array, so we get rid of the “Player Name” property.

[{"Player":"Kareem Abdul-Jabbar","Pos":"C",....},....]

Part 2 - If you remember correctly, our seasons aren’t in the structure that we want, this is where we fix that issue. We set Seasons equal to the _.values() of itself. _.values() will return the data in an array format, which is what we want.

[{"Player":"Kareem Abdul-Jabbar","Pos":"C","Seasons":[// Heres the array that we wanted :){"totals":{......},"advanced":{......}},{...}]},....]

Let’s see, we just have to call DB_OBJ.clean() now, go to the line where jsonfile.writeFile() is, and replace it with:

Export to MongoDB!

If you do not have MongoDB installed, follow their Installation Guide. If you’re a Mac user, I recommend following their Homebrew Guide, it’s really simple. Once you have it installed, the following section will change depending on how much experience you have with MongoDB. Choose one of the following: Experienced with MongoDB or MongoDB Noob.

Experienced with MongoDB

Go ahead and Fire up mongo with mongod in a separate terminal. Then come back to your initial terminal and input the following: