Programming with Data: Foundations of Python and Pandas

Whether in R, MATLAB, Stata, or Python, modern data analysis, for many researchers, requires some kind of programming. The preponderance of tools and specialized languages for data analysis suggests that general purpose programming languages like C and Java do not readily address the needs of data scientists; something more is needed.

In this training, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for interactive data analysis. Pandas is a massive library, so we will focus on its core functionality, specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses.

What you'll learn-and how you can apply it

Use the Split-Apply-Combine technique to calculate grouped summary statistics like mean, median, and standard deviation on your data

Load data from flat files, numpy, and native Python data structures and compute on them using Pandas

Avoid common pitfalls and “gotchas” in Pandas by understanding the conceptual underpinnings common to most data manipulation libraries and environments

This training course is for you because...

You have a solid understanding of Python programming

You want to learn how to load and transform tabular data in Python using Pandas

You want to accelerate your understanding of Pandas by learning general principles and requirements common to tabular data manipulation frameworks

Prerequisites

Intermediate-level programming ability in Python. Attendees should know the difference between a dict, list, and tuple. Familiarity with control-flow (if/else/for/while) and error handling (try/catch) are required.

No statistics background is required.

Course Set-up:

Step-by-step instructions for setting up a working Python environment with using Anaconda are available here. You will need a working environment to complete the exercises in Jupyter notebook. Alternatively, you may view the notebooks here.

About your instructor

Daniel Gerlanc is the Founder and President of EnPlus Advisors, a consultancy specializing in data science and custom software development. He started EnPlus in 2011 after working as a hedge fund quant for 5 years. At EnPlus, he focuses on projects that require expertise in both data analysis and software engineering. He has coauthored several open source R packages, published in peer-reviewed journals, and been an invited speaker at conferences including ODSC and PGConf. He is a graduate of Williams College.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing