Figure out which of your columns are of object type: list(df.select_dtypes(include=['object']).columns)

Convert them to something else: df['col'] = df['col'].astype(str)

This code may help:

# Attempt to auto-convert columns from `object` to a useful type.# Note that string columns may still have `dtype=object`, but will still be able# to export to Stata format.
type_pref = [int, float, str]
for colname inlist(df.select_dtypes(include=['object']).columns):
for t in type_pref:
try:
df[colname] = df[colname].astype(t)
except (ValueError, TypeError) as e:
pass

Feather

feather (GitHub) is a “fast on-disk format for data frames.” It is similar to pickle but for data frames.

The first time this code runs, it will be slow as Pandas reads from the original data source. Subsequent runs should be very fast.

DataFrame information

Get dimensions: df.shape

Get list of columns: df.columns

Selecting columns and rows

Select columns: df[['col1', 'col2', ..]]

Selecting rows:

df.loc[...] is used to select based on labels in the index

df.iloc[...] is used to select based on integer positions in the index

df.ix[...] tries to use labels, and falls back to positions if the label is not in the index. Only use this when you need to mix label and positions (like for selecting rows based on label and columns based on position).

Selecting columns and filtering rows at the same time:

df[['col1', 'col2', ..]][df['col1'] == "criteria"]

If you want to modify cells, you have to use loc, iloc, or ix to avoid this warning: “A value is trying to be set on a copy of a slice from a DataFrame”.

Other data manipulation

For changing to numeric, this works on recent versions of Pandas: df['col'] = pd.to_numeric(df['col'])

Changing the case of a string column: df['col'] = df['col'].str.upper() or df['col'].apply(lambda x: x.upper(), inplace=True) if you need to deal with unicode strings. Use lower() to lowercase rather than uppercase.

IPython notebooks

Have None be the last line in the cell to avoid junk output like
Out[1]:
<matplotlib.text.Text at 0x115933850>

Want to automatically open Chrome when you run jupyter notebook? Add this to /Users/you/.jupyter/jupyter_notebook_config.py: c.NotebookApp.browser = u'/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome %s'. If you don’t have that config file, jupyter notebook --generate-config will generate it.

Tests can be skipped with the @unittest.skip("message here") decorator.

Tests can automatically be run once with nosetest -v (put your tests in tests/, start each filename with test_, and each test method with test_).

To get some color in the test output, pip install pinocchio and then run nosetests -v --spec-color --with-spec.

I’m also trying out nose-progressive for colorizing nosetests tracebacks: pip install nose-progressive and then nosetests -v --with-progressive --logging-clear-handlers. This is not compatible with pinocchio, and it seems to crash sniffer.

To automatically run tests when a file changes, pip install sniffer and then run sniffer -x--spec-color -x--with-spec or sniffer -x--with-progressive.

To get a debugger to run from within a test, pip install nose2 and then run nose2. If you pdb.set_trace() in a test, the debugger will open.

Profiling

Profiling can help to determine which parts of a program are running slower than you want.

Installing Recent Python on OS X

I used to use these instructions to set up pyenv, but pyenv can cause some problems. Handling Python dependency requirements is difficult enough already that I don’t want any extra complexity on top of pip (the package manager) and virtualenv (separate environments with different installed dependencies).

So here is what I’m doing now:

Install Python 2.x.x from Homebrew to get an up-to-date version: brew install python --enable-framework (the --enable-framework is an attempt to get around this numpy issue, but it doesn’t appear to work)

Install virtualenv with brew install virtualenv.

Create a folder called ~/.virtualenvs to store all my virtualenvs (each one is a subfolder).

It doesn’t really matter what I call this, but putting it in ~/ (my user’s home folder) is convenient and using a .something folder will hide it in Finder.

Add the following to my ~/.zshrc file (or ~/.bashrc if I didn’t use zsh):

This will prevent pip install ... from working if I’m not in a virtualenv, which prevents me from accidentally installing anything globally (I do this all the time).

To get around this restriction, use gpip install ...

To activate a virtualenv named gorbypuff, run pyenv gorbypuff

To create this virtualenv and activate it, run newpyenv gorbypuff

Note that I use pyenv and newpyenv for these function names – and this is not the same as the pyenv that I mentioned ditching at the beginning of this section. I just find it easy to remember pyenv so that’s what I use.

Set up virtualenv using system Python so numpy will work: virtualenv -p /usr/bin/python2.7 ~/.virtualenvs/data-python2.7sys (this doesn’t seem to work either)