Pandas: The Swiss Army Knife for Your Data, Part 1

Pandas is an amazing data analysis toolkit for Python. It is designed to operate on relational or labeled data and gives you tools to slice and dice as you please.

In this two-part tutorial, you’ll learn about the fundamental data structures of Pandas: the series and the data frame. You’ll also learn how to select data, deal with missing values, manipulate your data, merge your data, group your data, work with time series, and even plot data.

Installation

To install, just pip install pandas. It will take care of installing numpy too if you don’t have it installed.

Series

Pandas series are typed and labeled 1-D arrays. This means that each element can be accessed by its label in addition to its index.

Here is a series of integers where the labels are Roman numerals. You can index and slice using the labels or integer indices. Unlike with regular Python list slicing, when using labels the last item is included!

Unlike Python lists or numpy arrays, operations on series align on the index. If the indexes don’t match then the union of indices will be used with missing values as appropriate. Here are a few examples using dicts as data so the keys become the series index:

Data Frames

Data frames are the primary pandas data structure. They represent tables of data where each column is a series. Data frames have an index too, which serves as a row label. A data frame also has column labels. Here is how to declare a data frame using a dict.

Selecting Data

Data frames let you select data. If you want to select a row by index, you need to use the loc attribute. To select columns, you simply use the column name. Here is how to select individual rows, individual columns, a slice of rows, a slice of columns, and last but not least, a rectangular section (subset of rows and subset of columns from these rows):

Conclusion

In this part of the tutorial, we covered the basic data types of Pandas: the series and the data frame. We imported and exported data, selected subsets of data, worked with metadata, and sorted the data. In part two, we’ll continue our journey and deal with missing data, data manipulation, data merging, data grouping, time series, and plotting. Stay tuned.

In the meantime, don’t hesitate to see what we have available for sale and for study in the marketplace, and don’t hesitate to ask any questions and provide your valuable feedback using the feed below.