About Hal Kiess

I retired as a Professor of Psychology from Framingham (MA) State University in 2002. The 3rd edition of my text Statistical Concepts for the Behavioral Sciences had just come out in 2001, so I thought I was done with my writing. I was doing and enjoying those things that retirees do, when at 3:59PM on November 8, 2006, I received an e mail from Bonnie Green at East Stroudsburg University inquiring about when a new edition of the text would be coming out. My response was that it probably wouldn’t unless I had a coauthor. By 8:40PM of that same day I had an offer of a coauthor, and by November 9, Bonnie was on her way making plans for a 4th edition.
Now Bonnie has a very different conceptualization of retirement than I do. My view of retirement:
* Leisurely breakfast with coffee and paper on the deck
* Watch mindless TV program
* Leisurely walk
* Relaxing lunch
* Nap
* Putter in the garden
* Relax
* Leisurely dinner
* Enjoy the hot tub
* Read a good book
Bonnie's view of my retirement:
* Early and quick breakfast
* Check email for breaking developments in statistics
* Choose a topic and do research for monthly blog entry
* Review journals for any relevant articles or information
* Write a rebuttal to critics of statistical hypothesis testing
* Lunch at the computer
* Review draft of Chapter XX
* Begin revision of Chapter YY
* Check blog to see if anyone made comments that need a response
* Quick supper
* Check email for more breaking developments in statistics
* Review comments made while reviewing Chapter XX
This blog, however, is at least partially my responsibility, for I think I first suggested it. Let’s see where it takes us. Please feel free to join in and share your ideas and experiences.

I always cringe when I see a statement in a text or website such as “the research hypothesis, symbolized as H1 , states a relationship between variables.” No! No! No! How can students not be confused on the difference between research and statistical hypotheses when instructors are? H1 is not the research hypothesis, it is the alternative to the null hypothesis in a statistical test.

Let’s be very clear, in most research settings, there are two very distinct types of hypotheses: the Research or Experimental Hypothesis, and the Statistical Hypotheses. A research hypothesis is a statement of an expected or predicted relationship between two or more variables. It’s what the experimenter believes will happen in her research study. For example a researcher may hypothesize that prolonged exposure to loud noise will increase systolic blood pressure. In this instance the researcher predicts that exposure to prolonged noise (the independent variable) will increase systolic blood pressure (the dependent variable). This hypothesis sets the stage to design a study to collect empirical data to test its truth or falsity. From this research hypothesis we can imagine the scientist will, in some fashion, manipulate the amount of noise a person is exposed to and then take a measure of blood pressure. The choice of statistical test will depend upon the research design used, a very simple design may require only a t test, a more complex factorial design may require an analysis of variance, or if the design is correlational, a correlation coefficient may be used. Each of these statistical tests will possess different null and alternative hypotheses.

Regardless of the statistical test used, however, the test itself will not have a clue (if I am allowed to be anthropomorphic here) of where the measurement of the dependent variable came from or what it means. More years ago than I care to remember, C. Alan Boneau made this point very succinctly in an article in the American Psychologist (1961, 16, p.261): “The statistical test cares not whether a Social Desirability scale measures social desirability, or number of trials to extinction is an indicator of habit strength….Given unending piles of numbers from which to draw small samples, the t test and the F test will methodically decide for us whether the means of the piles are different.”

Rejecting a null hypothesis and accepting an alternative does not necessarily provide support for the research hypothesis that was tested. For example, a psychologist may predict an interaction of her variables and find that she rejects the null hypothesis for the interaction in an analysis of variance. But the alternative hypothesis for interaction in an ANOVA simply indicates that an interaction occurred, and there are many ways for such an interaction to occur. The observed interaction may not be the interaction that was predicted in the research hypothesis.

So please, make life simpler and more understandable for your students. Don’t call a statistical alternative hypothesis a research hypotheses. It is not. Your students will appreciate you making the distinction.

Several posts ago, Bonnie said we would address some difficult concepts for student understanding of statistics. I thought I would take a shot at one of the concepts she listed, degrees of freedom (df).

To help understand this concept, let us first think of df in a non-statistical way and say that df refers to the ability to make independent choices, or take independent actions, in a situation. Consider a situation similar to one suggested by Joseph Eisenhauer. Suppose you have three tasks you wish to accomplish, for example that you want to go shopping, plan a vacation, and workout at the gym. Assume that each task will take about an hour and that you may do all on one day, or only one each day over the course of several days. I have created a situation with three degrees of freedom, you have three independent decisions to make. Suppose you decide you will go shopping today. Does this decision put any limitations on when you may do the other tasks? No, for you may still do the other tasks either today, or in the course of the next few days. Suppose next you decide to plan a vacation and you will do that that tomorrow. Does this decision place any limitation on when you may go to the gym? Again, no, because you still might go to the gym today, tomorrow, or on another day. Notice here, that each choice of when to do an activity is independent of each of the other choices. Thus, you have 3 degrees of freedom of choice in the order of doing the tasks.

Now, set a different scenario where I plan some limitation on the order in which you may do the tasks. You still have the same three tasks to do, except now you decide you will do only one a day and you want to have them all completed over a span of three days. This scenario has only 2 df, for there are only two independent decisions for you to make. After you have made a choice on two of the activities, the day for doing the third activity is “fixed” or decided by your other two choices. For example, suppose you decide to plan your vacation today. For this choice you have total freedom to make a decision for any of the three days. You next decide to plan when to go to the gym. Notice for this decision, however, you have only two choices left, either tomorrow or the following day. A statistician would say you have two degrees of freedom when making this decision. You decide to go the day after tomorrow. Finally, you have to plan shopping, but now you have essentially no choices open to you, it must be tomorrow. For this decision, you have no degrees of freedom. Thus, in a sense, you have 2 df in this scenario. You are free to make two choices, but making any two choices automatically determines your third choice.

Of course, the obvious question a student may ask is “What does all this have to do with statistics?” Let’s see. Statistically, the df are the number of scores that are free to vary when calculating a statistic, or in other words, the number of pieces of independent information available when calculating a statistic. Suppose you are told that a student took three quizzes, each worth a total of 10 points. You are asked to guess what her scores were. In this scenario, you may guess any three numbers as long as they are in the range from 0 to 10. In this example, you have 3 df, for each score is free to vary. Each score is an independent piece of information. Choosing the score for one quiz has no effect on either of the other two scores that you may choose.

But now I give you some information about the student’s performance by telling you that the total of her scores was 27. I have now created a scenario with 2 df. Suppose you guess 10 for the first score. Does choosing this score place any limitation on what you might guess for a score on the second, given that the total of the scores must be 27? No, for your choice of a second score is still free to vary from 0 to 10. You guess 9 for a second score. What about your choice of a third score? What must it be. If the total of the three scores is 27, and the first score you chose was 10, and the second 9, then your third choice must be 8 for a total of 27 to be obtained. In this instance, the third score is not free to vary if you know the total of the scores and any two of the three scores. For this example then, there are 2 df in the choice of scores. If you know the total of the three scores, then only two provide independent information, the third score becomes dependent on previous two scores. By giving you knowledge of the total of the scores I have reduced the df in the number of choices you have.

Can we now relate these two examples to the calculation of statistics? Consider that you have a sample of 10 scores and you want to calculate the mean for these scores. In order to do so, you must know all 10 scores, if you know only 9, you cannot calculate the mean. Thus if there are n scores in a sample, then for calculating the mean from this sample there are ndf. Each score is free to vary, and an independent piece of information. You cannot calculate the mean unless you know all n scores. But suppose you know the mean for the scores and you want to calculate the standard deviation (s) for the scores. In these instance, there are 9 df for these scores, for if you know the mean, you need to know only 9 of the scores, the 10th score is in a sense “determined” for you by the value of the other 9 scores. So, for a set of n scores, there are n – 1 df when calculating the standard deviation.

A question frequently arises when the idea of a fixed or determined score is discussed. Students may ask how can someone’s score on a test, for example, be “determined” or “fixed in value” by her other scores on tests? Students should be made to realize that during the actual data collection process all scores are free to vary and the concept of degrees of freedom does not apply. Degrees of freedom only come into play after the data have been collected and we are calculating statistics on those data.

These ideas can be expanded to the computation of other statistics. Consider analyzing data with a 2 x 2 chi-square test of independence. When we are collecting data for the contingency table, the concept of degrees of freedom is not applicable. After we have collected the scores, however, and each cell of the contingency table is filled, then we can use the cell totals to find the row and column marginal totals. Notice at this stage, that if I were to tell you the row and marginal totals, then I would need to give you only one cell total, and you would be able to determine the other three cell totals. In this instance, when knowing the row and marginal totals, there is only 1 df for the cell totals. In a more general sense, if there are r rows and c columns in a contingency table, then once the row and column totals are known, the table possesses (r – 1) (c – 1) df.

I think giving students this intuitive overview of df helps them to understand where such numbers come from when they are learning about various statistical tests. Perhaps it may help to make statistics a little less mysterious.

Around the turn of the century (seems strange to say that), I recall the excitement I felt as my university completed a wireless network on campus and began requiring all students to have laptop computers with wireless access. I was an early user of the internet in the classroom and I thrilled at the opportunities that would open to my students with this new technology. I marveled at how students dutifully brought their laptops to class and seemed to take copious notes during class, many more notes than they seemed to take using just pencil and paper. Of course, it did seem to reduce class discussion or the number of questions that were raised.

But my technological naivety came crashing down the day I discovered that what looked like note taking was anything but. I recall how shattered I was when I discovered that several students had spent one class booking airline tickets for a vacation. It wasn’t long before I decided that if I were to regain control of my class, that I would have to ban laptops from my classroom, which I proceeded to do. Ten years ago, it was relatively easy to limit wireless access in the classroom, for all we had were laptop computers, smart phones were still waiting the wings.

Of course, I retired several years later and forgot about the problem. But the problem seems not to have gone away, and in fact, has expanded with 3G smart phones and the proliferation of social networking. Now some schools are rethinking the idea of campus-wide wireless access and perhaps limiting student access to campus wireless networks.

Bonnie’s recent post on diversity of skill sets, knowledge base, and other student attributes in statistics classes made me think of diversity as the raison d’être for descriptive and inferential statistics.

Students beginning a statistics class often say something to the effect “I want to be a counselor, why do I need to take statistics? One of my favorite answers to this question is to ask students to imagine a world where every person is a clone of me. Everyone looks like me, acts like me, in fact, is identical to me in every way, physically, cognitively, and emotionally. Of course this scenario leads to gasps of horror, especially from the women in the class. With a little class discussion, however, the realization suddenly grows that in this world all behaviors would be normal because there would be no variability among people. There would be no standard deviation for any measure we might obtain. Hence, no counselors would be needed. No one would be handsome, beautiful, intelligent, arrogant, energetic, or helpful (I’m not implying that any of these adjectives actually describe me). No behaviors would be abnormal, criminal, empathic, altruistic, selfish, or whatever. Such concepts imply a diversity in physical appearance, intellectual functioning, behavioral actions toward others, and so forth. If we want to know anything about such a population, we need measure only one member of the population. A measure taken on one person would describe all other people. Descriptive statistics wouldn’t be needed in this world.

Students soon realize that diversity is the reason that statistics is a necessary discipline to understand and explain our world. When there is diversity no single term adequately describes everyone. Thus we have had to develop statistics that describe “typical” and the spread around the typical.

A similar discussion can lead to an understanding for the need for statistical hypothesis testing. Think how easy it would be to decide if an independent variable has an effect on behavior if every person’s behavior were identical. If we introduce the independent variable with one person, and it changes that person’s behavior from the state prior to its introduction, then we know it is effective. And it will have the same effect for everyone.

A world without variability wouldn’t require statistics, but it wouldn’t be much fun to live in either.

While watching a well-known TV news channel the other day, a report came on about student reactions to the enhanced security procedures at U. S. airports. The correspondent indicated that students are split right down the middle, 50-50 on the use of full-body airport scanners. Fifty percent favor, fifty percent oppose. I was curious what data there were to support this contention. The correspondent stated that half the students he talked to were in favor, half opposed. He then proceeded to present the results of his sample, which appeared to be N = 2, one student in favor, one student opposed. But he went on, according to another poll, most Americans favor the full-body scanners. Now “most” is a most ambiguous word in my mind. By definition, most simply means the majority, but if asked to attach a number to the word “most,” I tend to think about 80%. So I concluded that about 80% of Americans favor the new scanning procedure. But just to be sure, I looked up the poll that was being cited here. ( http://www.washingtonpost.com/wp-srv/politics/polls/postpoll_11222010.html?hpid=topnews)

The poll indicated it was based on a random sample of 514 adults who were called either on their land line or cell phones. Now a random sample is one in which each member of a a population has an equal chance of being included in the sample. But I for one, have no chance of being included in this sample; I never respond to telephone polls, sometimes quite ungraciously. So the question arises, how many calls were made to obtain 514 responses? And how would those who declined to participate in this poll have responded?

The poll results indicate that if we consider only those people who “strongly support” the new body scanners, then 37% of the sample responded favorably. Twenty-seven percent were somewhat in favor, for a total of 64% responding that they support the scanning, either strongly or somewhat. But only 48% responded in a favorable way to the enhanced hand searches. And it also seems reasonable to expect that those who fly infrequently or not at all (53% of the respondents in the survey) may differ in their beliefs from those who fly frequently. We might also expect differences related to age of respondents, but we can’t tell from the results of the poll.

In her recent blog post, Bonnie included data collection methods as one of the core concerns for an introductory statistics course. To quote:

“Though I have said to my students more times that I can count, ‘the quality of our statistics is limited by the quality of our sample,’ I must admit to being a bit surprised that this was considered critical by others, especially since when I look at many undergraduate statistics textbooks, data collection methods are barely mentioned.”

The two examples given above provide excellent support for Bonnie’s contention that students should be taught to carefully evaluate the quality of the data on which statistics are based. Who were the subjects and how were they selected? What questions were asked and what responses allowed? And what inferences can be made from the results. It might prove to be an interesting class exercise to have students find media reports of current polls and then actually access the poll to see who the respondents were, what questions were asked, and the results obtained.

In his October 31st post, Marty stated “The statistical significance test simply assesses the likelihood of the rival hypothesis of “chance.” ” I would like to elaborate a little on this statement because it makes a very important point about statistical hypothesis testing. As both Bonnie and Marty have indicated, there will always be error in any data that we collect– sampling error, measurement error, and experimenter-procedural error. Unfortunately, humans are not well prepared to assess the extent of this error in data from a mere observational basis. Too often we are wont to see relationships in nature where none exist. Statistical hypothesis testing offers a relatively simple (although students often don’t initially perceive it to be simple) solution for this problem.

A statistical hypothesis test is a dispassionate method of making a decision of whether “chance” most reasonably explains the relationship observed in the data or is there something else we should search for in explanation. It is important to remember that the statistical test simply tests a null hypothesis assuming that certain conditions apply in the data being tested. It is up to the experimenter to insure that those conditions are met by his or her data. And, if the hypothesis test indicates that the hypothesis of chance is an unlikely explanation of the relationship observed, it does not provide any evidence that the research hypothesis is a plausible explanation for the results. An experiment can be confounded or a third variable may be responsible for an observed correlation. The statistical test cannot assess the likelihood of such occurrences in the data, only a careful analysis of the design of the research can provide that assessment. This important point is sometimes misrepresented by text authors with statements similar to “the alternative hypothesis states that the independent variable does affect the dependent variable.” But the alternative hypothesis of a statistical tests states no such such relationship. For a parametric test, it simply indicates that the sample means were drawn from different populations, but not the reason why those populations may differ. And this alternative hypothesis remains essentially the same regardless of the experimenter’s research hypothesis. On the other hand, if the hypothesis test indicates that chance is the most plausible explanation for the results obtained, then again it cannot indicate whether the result was from a poor design or inadequate measurement of the variables in the research.

Hypothesis testing thus simply provides an objective way of deciding that given the data we have obtained, is chance a plausible explanation? But hypothesis testing is simply the start of the explanatory process, not the end of that process.

One of the first decisions to be made when teaching a statistics course is whether the course will focus on the conceptual nature of statistics or the computation of the statistics. At the risk of identifying myself as a dinosaur, my first course in statistics focused on the computational aspects of statistics. Using lumbering, sometimes cranky, and always noisy mechanical calculators (anyone out there remember the Marchant?), we furiously calculated sums of squares, cross products, and grand totals. Given the emphasis of drill on calculations and the amount of time needed to do them, there was little time left to understand what was being done by the computations. As long as the F value obtained was not a negative number or r was not greater than 1.0, who cared what the statistic meant? What was important was learning to do statistics “by hand.”

Although the focus of all editions of our text Statistical Concepts for the Behavioral Sciences, 4/Ehas been to develop statistics conceptually using definitional formulas, computational formulas were included in previous editions for those instructors who wanted students to experience the “by hand” approach. But the revolution in computing in the 1990’s and the ready availability of reasonably easy-to-use statistics software packages changed thinking about the value of computational formulas in the teaching of statistics. Computational formulas provide no value in understanding the nature and function of a statistic, they simply ease computations that few people still do. Thus for the fourth edition of our text we removed all computational formulas.

Katarina Guttmannova, Alan Shields, and John Caruso (2005) argue that computational formulas do not add to a student’s understanding of statistics. For example, a discussion of the variance using the definitional formula allows students to obtain a understanding of the concept of dispersion of scores. The computational formula, however, does not offer the possibility for this understanding (Guttmannova, K., Shields, A. L., & Caruso, J. C., 2005. Promoting conceptual understanding of statistics: Definitional versus computational formulas. Teaching of Psychology, 32, 251-253).

A frequent criticism of the use of statistical hypothesis testing is that it is often misused and misunderstood. Michael Firmin and Elizabeth Proemmel (2008,http://www.cluteinstitute-onlinejournals.com/PDFs/793.pdf) indicate that research from their classes suggests that students themselves recognize the need for a better understanding of the conceptual basis of statistics and their appropriate application. A conceptual approach to teaching statistics should help students to better understand when it is appropriate to apply a particular statistic to a given set of data and what the value of the statistic tells them about the data. Anything that instructors can do to help foster a deeper understanding of the use of statistics will be beneficial to the discipline.