Setup Your Data Science Environment

Contents

OSX

First things first. Your terminal program allows you to type commands to
control your computer. On a Mac, you can open the Terminal by going to your
Applications screen and selecting Terminal (it might be in the folder named
“Other”). Or, you can open Spotlight (Cmd + Space) and type “Terminal”.

First, let’s install brew if you haven’t done that yet. Homebrew is a
program that allows you to easily install other software on OSX. In your
terminal, run:

# This downloads the Ruby code of the installation script and runs it
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Run these commands to create a new conda environment. Each conda
environment has its own package versions. This allows us to switch between
package versions easily. For example, this class uses Python 3, but you
might have another that uses Python 2. With a conda environment, you can
switch between those at will.

Windows

Getting set up on Windows is especially prone to error if you aren’t careful
about your configuration. If you’ve already had Anaconda or git installed and
can’t get the other to work, try uninstalling everything and starting from
scratch.

Installing Anaconda:

Visit https://www.continuum.io/downloads#windows and download the installer
for Python 3.5. Download the 64-bit installer if your computer is 64-bit
(more likely), the 32-bit installer if not. You can Google how to check
whether your computer is 64 or 32 bit.

Leave all the options as default (install for all users, in the default
location). Make sure both of these checkboxes are checked:

Install.

Verify that the installation is working by starting the Anaconda Prompt (you
should be able to start it from the Start Menu) and typing python:

Notice how the python prompt shows that it is running from Anaconda. Now
you have conda installed!

From now on, when we talk about the “Terminal” or “Command Prompt”, we are
referring to the Anaconda Prompt that you just installed.

Run these commands to create a new conda environment. Each conda
environment has its own package versions. This allows us to switch between
package versions easily. For example, this class uses Python 3, but you
might have another that uses Python 2. With a conda environment, you can
switch between those at will.

Linux

These instructions assume you have apt-get (Ubuntu and Debian).
For other distributions of Linux, substitute the available package manager.

You likely already know this if you’re running Linux, but just in case: your
terminal program allows you to type commands to control your computer. On
Linux, you can open the Terminal by going to the Applications menu and
clicking “Terminal”.

Install wget. This is a command-line tool that lets you download
files / webpages at the command line.

Run these commands to create a new conda environment. Each conda
environment has its own package versions. This allows us to switch between
package versions easily. For example, this class uses Python 3, but you
might have another that uses Python 2. With a conda environment, you can
switch between those at will.

This should download a copy of the course materials (including this homework)
onto your personal computer and set up git remotes so that you can pull
released assignments from the staff and push your personal work to your private
repo.

Now, when you want to pull new/updated assignments, you can run:

# Make a work-in-progress commit since git doesn't allow pulling when you
# have uncommited modifications
git commit -am "WIP"
# Get updates from the course repo. The options here tell git to override
# any conflicts in the files with what you currently have so that your work
# is never erased.
git pull -s recursive -X ours --no-edit ds100 master

Opening notebooks

To open Jupyter notebooks, you’ll navigate to the sp17-materials
directory and run:

jupyter notebook

This will automatically open the notebook interface in your browser. You can
then browse to a notebook and open it.

Verifying your installation

Finally, let’s open a notebook that will check to see whether you’ve installed
everything correctly.

In your sp17-materials directory, ensure that you are in the ds100 conda
environment by running source activate ds100 on OSX / Linux or source
activate on Windows. Then, run git pull ds100 master and then jupyter
notebook.

Now, open the test_setup.ipynb notebook. If you’ve installed everything
correctly, all the cells should run without error.