Analyze data with Python in 5 minutes using Pandas

Ready to learn how to analyze data with Python in few minutes, without knowing too much about Python language? You can easily import 130.000 rows in few sceonds with Pandasmodule for Python. And using less than 10 commands you can explore number of records, column, and start to know mean, max & minimum and a lot more on your dataset

pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Attention: change your path, changing ‘C:\DATASET\WINE REVIEWS\winemag-data-130k-v2.csv’ with your path. If you are not familiar with this check it out: 3 simple way to change your path in Anaconda/Python

WHAT KIND OF VARIABLE WE HAVE IN THE DATASET?

Using Anaconda, analyzing data with Python and Panda will be simple. We can see that now our df (Dataframe) has around 130k records and 14 variables(129971,14)

Using command:

df.dtypes

you can easily check what kind of data do you have. Here we have basically all variable as an object , while first variable, points are integer number (int64) and price include floating number (with decimals)

WHICH IS THE AVERAGE SCORING AND PRICE IN ANY COUNTRY?

One of the most commont things to analyze data with Python, is to understand average data, maybe grouping for some of your variables.If you want to know which is the average score and price by country, you can use

.groupby ().operations,

where in the parenthesis you need to put variable to be grouped and after the operation that you want to do .mean or .sum for example

df.groupby(‘country’).mean()

group by

So we will discover that in our dataset, average price in Argentina is 24,5$ with an average of 86,7$, beter than Austria that has 90 point in average but you have to pay 30$

Of course you can groupby by multiple column (‘country’,’region’)

FILTER ONLY DATA WITH PRICE >90$

Maybe you are interesting to easily know how many permutations you have in your database that fit with a particular threshold. In this case we would like to know how many records has a price >90$. Result is more than 4.000 records

df[df.price>90]

HOW TO EXPORT IN CSV OR EXCEL WHEN YOU ANALYZE DATA WITH PYTHON ?

Easy, just write

df.to_csv('first.csv') #creating a csv file called first
df.to_excel('first.xlsx', sheet_name='Sheet1') #creating a xlsx file called first, in sheet 1

OTHER USEFUL COMMAND TO ANALYZE DATA WITH PYTHON

Df.head= See head of your dataset

Df.columns= show your columns names

Df.tail[3] = show latest 3 record of your dataset

Df.index= show you the range of your dataset

Stay tuned next time we will describe how to user wordcloud to describe string variables. Subscribe to our newsletter for more news

Don’t forget to leave your comment and to share the article if you like it.