Introduction to Python package pandas

It contains high-level data structures and manipulation tools designed to make data analysis fast and easy in Python. Pandas is built on top of NumPy.

Let’s see how can help by reading and analysing a data set.

The Series and the DataFrame are the pandas foundation classes.

A Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index.
The simplest Series is formed from only one array of data:

A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.).

The DataFrame has both a row and column index; it can be thought of as a dictionary of Series (one for all sharing the same index) and is similar – but not completely the same – as R’s structure data.frame, which was the inspiration, I guess.

To create a dataFrame, you can pass a dictionary of lists to the DataFrame constructor:

The key of the dictionary will be the column name

The associating list will be the values within that column.

Let’s see an example to make it clear how it looks like, such as a list of stocks with the associated value:

The DataFrame class has many methods which will see gradually. One of its nice function is describe which prints a statistical summary (size, mean, standard deviation, min and max, median and the quartiles) of the values found in the columns:

Welcome!

This is my personal blog, where I write about what I learned, mostly about software, project management and machine learning.
Why this name? The blog should help me to navigate into the future using (and not forgetting) the past experiences.
From Europe to the world.