But suppose we wish to do time series operations with the variables. A better
representation would be where the columns are the unique variables and an
index of dates identifies individual observations. To reshape the data into
this form, use the pivot function:

If the values argument is omitted, and the input DataFrame has more than
one column of values which are not used as column or index inputs to pivot,
then the resulting “pivoted” DataFrame will have hierarchical columns whose topmost level indicates the respective value
column:

Closely related to the pivot function are the related stack and
unstack functions currently available on Series and DataFrame. These
functions are designed to work together with MultiIndex objects (see the
section on hierarchical indexing). Here are
essentially what these functions do:

stack: “pivot” a level of the (possibly hierarchical) column labels,
returning a DataFrame with an index with a new inner-most level of row
labels.

unstack: inverse operation from stack: “pivot” a level of the
(possibly hierarchical) row index to the column axis, producing a reshaped
DataFrame with a new inner-most level of column labels.

The clearest way to explain is by example. Let’s take a prior example data set
from the hierarchical indexing section:

Notice that the stack and unstack methods implicitly sort the index
levels involved. Hence a call to stack and then unstack, or viceversa,
will result in a sorted copy of the original DataFrame or Series:

These functions are intelligent about handling missing data and do not expect
each subgroup within the hierarchical index to have the same set of labels.
They also can handle the index being unsorted (but you can make it sorted by
calling sort_index, of course). Here is a more complex example:

The melt() function is useful to massage a
DataFrame into a format where one or more columns are identifier variables,
while all other columns, considered measured variables, are “unpivoted” to the
row axis, leaving just two non-identifier columns, “variable” and “value”. The
names of those columns can be customized by supplying the var_name and
value_name parameters.

The function pandas.pivot_table can be used to create spreadsheet-style pivot
tables. See the cookbook for some advanced strategies

It takes a number of arguments

data: A DataFrame object

values: a column or a list of columns to aggregate

index: a column, Grouper, array which has the same length as data, or list of them.
Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.

columns: a column, Grouper, array which has the same length as data, or list of them.
Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.

The result object is a DataFrame having potentially hierarchical indexes on the
rows and columns. If the values column name is not given, the pivot table
will include all of the data that can be aggregated in an additional level of
hierarchy in the columns:

Use the crosstab function to compute a cross-tabulation of two (or more)
factors. By default crosstab computes a frequency table of the factors
unless an array of values and an aggregation function are passed.

It takes a number of arguments

index: array-like, values to group by in the rows

columns: array-like, values to group by in the columns

values: array-like, optional, array of values to aggregate according to
the factors

aggfunc: function, optional, If no values array is passed, computes a
frequency table

To convert a categorical variable into a “dummy” or “indicator” DataFrame, for example
a column in a DataFrame (a Series) which has k distinct values, can derive a DataFrame
containing k columns of 1s and 0s:

Notice that the B column is still included in the output, it just hasn’t
been encoded. You can drop B before calling get_dummies if you don’t
want to include it in the output.

As with the Series version, you can pass values for the prefix and
prefix_sep. By default the column name is used as the prefix, and ‘_’ as
the prefix separator. You can specify prefix and prefix_sep in 3 ways

string: Use the same value for prefix or prefix_sep for each column
to be encoded

If you just want to handle one column as a categorical variable (like R’s factor),
you can use df["cat_col"]=pd.Categorical(df["col"]) or
df["cat_col"]=df["col"].astype("category"). For full docs on Categorical,
see the Categorical introduction and the
API documentation. This feature was introduced in version 0.15.