3 fundamentals statistic skills for data science?

Today I would like to discuss with you about statistic skills you need to develop, if you want to become a data scientist. Do you need to be graduated in Statistics to do this job? The quick reply is some Statistics is needed, but practice is more important.

Statistic skills: what are you talking about?

Data Science without Statistics is like owning a Ferrari without brakes. You can enjoy sitting in Ferrari, show off your newly owned car to others, but you can’t enjoy the drive for long because you would crash land soon!

-know most common statistical models and define which is best to use (like linear regressions. Time series analysis)

-evaluate if the model is working for the purpose of your analysis.

But theory is not all, so the best way to learn about Stat skills is through practical approach. So don’t expect to become a good data scientist only reading books or learning theory.

Use statistics skills to explore data:

Understand & summarize your data:

If you are new in the world of data, dataset and graph, you can start from this free course : Analyzing categorical data provided by Khan Academy. Here you will learn how to identify individuals, variables, read different types of graphs and much more. I suggest to stop at first module, if you are at a basic level.

Take advantage of Simple statitiscal concept

Let’s briefly report same simple statistical concept that it will be deep dive in separate post

Descriptive statistics: you are probably familiar with mean, median, mode, ranges and quartile. This info will help you to understand how looks like your dataset.

Coming back to our Wine dataset just with one command you can identify many of these information. In this case you will see that your database has around 130.000 records, with an average points (coming from reviews) of 88,45 and a reported average price of 35,36$

Simple statistic skills: descpritive statistics with pyton and pandas

Minimum value is 80 and 4€ for price and max is 100 for variable points and 3.300 for price (Wow!!)

Percentiles:25%, also called first quartile: it means that observation 32.492 is represe
nting 25% of your dataset (in ascending order). This observation has an average review of 86 and a price of 17$.

Interesting to see that to arrive to 50% of this database you will increment only 2 value in points (88) but +30% in price (25%)

Distributions: explain you how it is possible (probable) that your data will be distributed. More famous is normal distribution, also knowned as “bell curve” (that happens many time in nature). Another important distribution curve, is binomial, that easily represent two status, i.e success or failure of a new drug.We will discuss about distributions in a separate post about distributions.

In the next topic we will discuss also about Hypothesis testing, Regression model, Time series analysis and other Intermediate Statistical concepts

Stay connected and subscribe to our newsletter to learn more about how to became a great data scientist.

If you have liked this post on fundamentals statistic skills for a great data scientist, please sharing it through social buttons. Let me know your comment or thought adding a comment!