Introduction to Pandas

Python’s popularity has skyrocketed with the creation of Pandas. It has become the de facto python library when working with heterogeneous tabular data, and has since been integrated with various Python libraries. While many tasks can be performed in spreadsheet programs, e.g., Excel, Pandas allows you to script these tasks in Python so you have a complete audit trail for how your data was manipulated. Additionally, more and more datasets are hitting the limits of how much spreadsheet programs can even open, so having an alternative means to work with these types of data is essential.

This Pandas introduction will guide you from “opening” Python, to loading a dataset and beginning the process of cleaning and analyzing data.

What you'll learn-and how you can apply it

This Pandas introduction will guide you from “opening” Python to loading a dataset and begin the process of cleaning and analyzing data.

This training course is for you because...

You are new to data analytics and/or performing data analytics using Python

You want a more reproducible workflow to cleaning and process data

You want to learn Python in an applied way by working with data

You have used python before, but want to see how it can be used to clean and process datasets

Prerequisites

It will help if you know some basic bash/shell commands (On macos/linux: ls, cd, windows: dir, cd)

Participants enrolled in this course need to have the following installed on their computers:

Have python and pandas installed. This can be done using the anaconda or canopy distribution

About your instructor

Daniel Chen, trainer and data scientist, is a graduate student in the interdisciplinary Ph.D. program in genetics, bioinformatics & computational biology (GBCB) at Virginia Polytechnic Institute and State University (Virginia Tech). He is involved with Software Carpentry and Data Carpentry as an instructor and lesson maintainer. He completed his master’s degree in public health at Columbia University Mailman School of Public Health in epidemiology with a certificate in advanced epidemiology and is currently extending his master’s thesis work on attitude diffusion in social networks in the Social and Decision Analytics Laboratory under the Biocomplexity Institute of Virginia Tech.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing