Getting Started with GraphLab and SFrames

GraphLab is a Python library that gives many out of the box features to use. It is a great library to learn the Machine Learning foundations. Many courses out there teaches several algorithms with bunch of tools, and non real world examples. However if you are new to Machine Learning, GraphLab(powered by DATO) is a great library to start.

In this post I’ll try to give some intuition about SFrames and I’ll show some simple data visualization examples using iPython Notebooks.

SFrame Basics

# Type sf and press shift + enter on iPython Notebook
sf

Here is my output;

sf.head() function will also fetch the few lines from the beginning of the file. You can also use sf.tail() function to retrieve few lines of data from the end of the file. However because we don’t have that much records in our dataset, the output of those 3 functions will be the same.

GraphLab Canvas

Graph Lab Canvas is a built in visualization tool that comes with GraphLab Create.

# We can take any data structure in GraphLab Canvas.
# We will use our sample data for the following examples.
sf.show()

You will have an output which will redirect you to the Canvas web application.

Here is my output;

You can click on each column and see the most frequent items. Also in Table view you can view your data in a clean and very nicely structured way. SFrames are not storing the data in memory. So you may even view 1 billion of rows in GraphLab Canvas.

Here are some more simple operations;

# Set the target as iPython notebook to view visualization directly in your notebook.
graphlab.canvas.set_target('ipynb')
#View the age column's visualization in iPython notebook in categorical format.
sf['age'].show(view='Categorical')
#We can also calculate the mean value or the max value of the age column.
sf['age'].mean()
sf['age'].max()

Create new columns in our SFrame

sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']

This code will create a new column that consists of the First Name and the Last Name columns.

If you noticed in our Country column we have United States for some rows and USA for some other rows. We can write a function and and use it in a for loop to fix this problem for each row. However there is a more clean and neat way to to this in GraphLab Canvas.