But many times students know theprocessof using a statistical formula (more accurately called a “test”) and not necessarilywhenthey should use that formula – or formulas. In this Notebook entry, I’ll explain some basics about when to use a particular test, and then point you to further resources about using them. In a future Notebook entry we’ll put all this together in R and Python to see examples of answering a specific question. Of course, this isn’t an exhaustive list, but will help you get started.

There are three main test areas for Statistics:Centers of data,Groupings of data, andRelationships between data. How do we decide which test to use? You can often find that with just a few questions:

What are your trying to find out?

How many data do you have?

What kind of data is it?

What are you trying to find out?

There are two or three main areas within data that you can use statistical tests to get an answer for a question you have. These questions are often stated as “Hypotheses” or, “guesses”. If you have a guess (more people will like my blog, less people will buy that thing, and so on) you can use thetests for proportion(more on that in a moment, with a link and everything), check theaverageandstandard deviation, and check thedifference of two paired means(again, more on that in a moment).

Perhaps you want to compare two statistics – you want to see how well things correlate. For that answer you can use tests involving thedifferences of two proportions, or thedifference of two independent sample means. If you want to see how well two populations relate to each other, you can use achi-square goodness of fit test.

You might also want to know how the data isrelated– if at all. Here you have two choices – but the choice depends on the type of data you’re working with – more on that in a moment. For now, know that if the data areCategorical, use thechi-square test for independence, and if they areQuantitative, useregression analysis. I’ll explain what those mean in a moment.

How many data do you have?

Wait – how “many”? Don’t I mean how “much”? Nope. In statistics,moredata is almost always better, so its’ really a question in the formula you choose of how many groups of data are you working with. If you have one big set of data (don’t think about SQL Tables here, think more about a View), then you have one “sample”. And within that sample, you need to decide if you’re looking at one variable (orFeature), or more than one.

If you follow the links in the tests I mentioned, you’ll notice that the web page there specifies thetype of variableyou’re dealing with – that’s what I’ll cover next.

What kind of data is it?

You can collect data on anything, but how you collect it and what you collect determines the type of tests you can run. There are two big types:Categoricaldata (also called “Nominal”) andQuantitative(also called “Interval and Ratio” data). There’s another type called “Ordinal”, but I’ll deal with that in another Notebook entry.

Categoricaldata is what it sounds like – is it tall, short, working, not working, green, orange, in the box, not in the box, of a certain species, and so on. You’ll see this data as counts, or percentages. You can test for this kind of data using aHypothesis Test such as a Difference Between Proportions. This is useful when you want to show how different two groups (populations) are, such as, “Who usually buys the opposite of our product?” Other tests here include thedifferences between the means,independence tests(likechi-squared) and theregression slope.

Quantitativedata is numeric. You can performcounts,sumsand other aggregation calculations, but most often you’ll use theaverage. You’ll then be able to use various comparison tests for those averages such as atest for a mean,differences of the means of independent or paired means, and alsoregression. Regression helps in predicting values that follow.

Resources

Of course this Notebook entry isn’t exhaustive, and glosses over some things that can be really important. It’s only meant to get you started. You can find out more in the links below – and keep studying those statistics courses I pointed out!

If you’re further along in your statistical learning, this is a good video to see another angle dealing with when to use a test based on whether you’re going after a prediction (inferential) or trying to group (descriptive) your data:https://www.youtube.com/watch?v=HpyRybBEDQ0