Of the over two million college degrees that are granted in the U.S. every year, including those earned at accredited online colleges nationwide, probably two-thirds require completion of a statistics class. That’s over a million and a half students taking Statistics 101, even more when you consider that some don’t complete the course.

Everybody who has completed high school has learned some statistics. There are good reasons for that. Your class grades were averages of scores you received for tests and other efforts. Most of your classes were graded on a curve, requiring the concepts of the Normal distribution, standard deviations, and confidence limits. Your scores on standardized tests, like the SAT, were presented in percentiles. You learned about pie and bar charts, scatter plots, and maybe other ways to display data. You might even have learned about equations for lines and some elementary curves. So by the time you got to prom, you were exposed to at least enough statistics to read USA Today.

Faced with taking Statistics 101, you may be filled with excitement, ambivalence, trepidation, or just plain terror. Your instructor may intensify those feelings with his or her teaching style and class requirements. So to make things just a bit easier, here are a few concepts to remember.

Everything is Uncertain

The fundamental difference between statistics and most other types of data analysis is that in statistics, everything is uncertain. Input data have variabilities associated with them. If they don’t, they are of no interest. As a consequence, results are always expressed in terms of probabilities.

Characteristic of Population—This is the part of a data value that you would measure if there were no variability. It’s the portion of a data value that is the same between a sample and the population the sample if from.

Natural Variability—This part of a data value is the uncertainty or variability in population patterns. It’s the inherent differences between a sample and the population. In a completely deterministic world, there would be no natural variability.

Sampling Variability—This is the difference between a sample and the population that is attributable to how uncharacteristic (non-representative) the sample is of the population.

Measurement Variability—This is the difference between a sample and the population that is attributable to how data were measured or otherwise generated.

Environmental Variability— This is the difference between a sample and the population that is attributable to extraneous factors.

The goal of most statistical procedures is to estimate the characteristic of the population, characterize the natural variability, and control and minimize the sampling, measurement, and environmental variability. Minimizing variance can be difficult because there are so many causes and because the causes are often impossible to anticipate or control. So if you’re going to conduct a statistical analysis, you’ll need to understand the three fundamentals of variance control—Reference, Replication, and Randomization.

Statistics ♥ Models

Statistics uses distribution models (equations) to describe what a data frequency would look like if it were a perfect representation of the population. If data follow a particular distribution model, like the Normal distribution, the model can be used as a template for the data to represent data frequencies and error rates. This is the basis of parametric statistics; you evaluate your data as if they came from a population described by the model.

Statistical techniques are also used to build models from data. Statistical analyses estimate the mathematical coefficients (parameters) for the terms (variables) in the model, and include an error term to incorporate the effects of variation. The resulting statistical model, then, provides an estimate of the measure being modeled along with the probability that the model might have occurred by chance, based on the distribution model.

Measurement Scales shape Analyses

You may not hear very much about measurement scales in Statistics 101, but you should at least be aware of the difference between nominal scales, ordinal scales, and continuous scales. Nominal scales, also called grouping or categorical scales, are like stepping stones; each value of the scale is different from other values, but neither higher nor lower. Discrete scales are like steps; each value of the scale has a distinct break from the next discrete value, which is either higher or lower. Continuous scales are like ramps; each value of the scale is just a little bit higher or lower than the next value. There are many more types of scales, especially for time scales, but that’s enough for Statistics 101.

The reason measurement scales are important is that they will help guide which graph or statistical procedure is most appropriate for an analysis. In some situations, you can’t even conduct a particular statistical procedure if the data scales are not appropriate.

Everything Starts with a Matrix

You may not realize it in Statistics 101, but all statistical procedures involve a matrix. Matrices are convenient ways to assemble data so that computers can perform mathematical calculations. If you go beyond Statistics 101, you’ll learn a lot about matrix algebra. But for Statistics 101, all you have to know is that a matrix is very much like a spreadsheet. In a spreadsheet you have rows and columns that define rectangular areas, called cells. In statistics, the rows of the spreadsheet represent individual samples, cases, records, observations, entities that you’re making measurements on, sample collection points, survey respondents, organisms, or any other point or object on which information is collected. The columns represent variables, the measurements or the conditions or the types of information you’re recording. The columns can correspond to instrument readings, survey responses, biological parameters, meteorological data, economic or business measures, or any other types of information. You usually have several sets of variables for a given set of samples. Together, the rows and the columns of the spreadsheet define the cells, which is where the data are stored. Samples (rows), variables (columns), and data (cells) are the matrix that goes into a statistical analysis. If you understand data matrices, you’ll be able to conduct statistical analyses even without your Statistics 101 instructor to help you.

Statistics is More than Description and Testing

In Statistics 101, you learn about probability, distribution models, populations, and samples. Eventually, this knowledge will enable you to be able to describe the statistical properties of a population and to test the population for differences from other populations. But these capabilities, formidable though they are, don’t reveal the truly mind boggling analyses you can do with statistics. You can:

Compare and Test—detecting differences between statistical populations or reference values using simple hypothesis tests, and analysis of variance and covariance.

Identify and Classify—identifying known or hypothesized entities or classifying groups of entities using descriptive statistics; statistical tests, graphics, and multivariate techniques such as cluster analysis and data mining techniques.

10 Responses to Five Things You Should Know Before Taking Statistics 101

We wanted to let you know that your blog was included in our list of the top 50 statistics blogs of 2011. Our goal was to highlight blogs that students and prospective students will find useful and interesting in their exploration of the field.

Right on target. As a teacher I see a great degree of frustration from students that didn’t know what they were getting it by taking a college statistics class. They thought that with the high school statistics they would do great, and to some degree that is the case for the majority, but your mileage varies considerably.

I have no interest in learning Statistics but still need to take it to fulfill the IGETC. It’s either that or other type maths which I think would be even more difficult for me (Precalculus). I am going to read this page again several times and hope for the best when I take this class this coming Summer.

Recent Posts from: Random TerraBytes

DISCLAIMER

The postings on this blog are my own (except as noted) and do not necessarily represent the positions, strategies or opinions of my current, past, and future employers, cats and other family members, relatives, Facebook friends, real friends, Charlie Sheen, people who sit next to me on public transportation, or myself when I’m in my right mind.

About pictures

I decided to start using other peoples' pictures of cats for my blogs for a variety of reasons. It's hard enough for me to get a good picture of my cats let alone one that might go with what I'm writing. I also thought it would improve my blogs by having a much greater variety of images to choose from. I understand enough about creativity and art and photography to know they are both a talent and a skill that should be recognized. I want to give proper attribution to the creators of the images I use in my blogs, but there is a problem. Virtually every image I want to use appears in more than one place on the Internet. I thought using tineye.com, a search site for finding URLs of uploaded images, would help. In fact, I found the opposite. Some of the images I've searched for are found on a hundred different sites, making it impossible to identify the original. So, if I can't identify the original, I'll cite the site I got the image from or if it's an image I don't have a URL for, I'll cite the site that tineye.com indicates has the image that most closely matches the image I use. If I use an image that you created and I didn’t give you credit, I'm sorry. Let me know and I’ll fix the citation or remove the image.