Wednesday, May 29, 2013

Visible Learning, Invisible Evidence

So I'm "done" (returning) Hattie's Visible Learning tomorrow. I read over the first two chapters; didn't really focus on the actual "meat" of the book as I don't think the numbers mean squat. They are at best extremely unreliable; I'd love to see someone try to test some of these numbers. (i.e., focus on one strategy, test it repeatedly, and see if the results come back anywhere near the average Hattie presents. Or even take a few [large] random samples of older research and see if the same number comes back up.)

A few of my questions/comments/concerns:

1. If these effect sizes are accurate, why can a teacher not focus on 2-3 things and thus be more-or-less a "great" teacher? If these evaluations' [e.g., Marzano] checklists aren't checklists as claimed, that is, "It's stuff [we're] already doing in class"...well...with all these great effects, why isn't virtually every teacher great? I see three possibilities (not mutually exclusive):
i. Virtually every teacher is not doing them (and there are a LOT of them) enough.
ii. Virtually every teacher sucks at virtually every one of them.
iii. The numbers suck.

(Technically I can think of a fourth but I excluded it; there is the--illogical--possibility that the numbers are somehow not cumulative. But if that's the case, it destroys the whole argument for implementing these strategies.)

2. Hattie states that a d=0.4+ is the "zone of desired effects". Yet he also states, "Further, there are many examples that show small effects may be important" and goes on to mention a study with a d = 0.07 wherein "34 out of every 1,000 people would be saved from a heart attack if they used low dose aspirin on a regular basis". Well, if it effects 34 out of 1,000 people, it would save 1.9 million out of ~55 million. I use this latter number because that's how many K-12 students there are in the US. Obviously this wouldn't be as signficant as a life-or-death situation, but if it's going to help (rather than save) that many kids, is it worth looking into? To quote Hattie, "This sounds worth it to me." (pg 9) Hattie's "hinge point" seems purely arbitrary. This also highlights the difference between the (pseudo)scientific approach of meta-analysis in the medical field and education, which leads to...

3. Applying a scientific approach to unscientific data results in unscientific results. And seeing as how this whole book strikes me as just yet another attempt to latch onto science's credibility (something educational research, generally speaking, does not have), that's a big deal. In fact, there's something absurd about even having to discuss whether the quality of the data matters (pg 11). Case in point: He cites Torgerson et al. (2004), who used 29 out of 4,555 potential studies on a subject area. These were chosen as "quality" (Torgerson's definition) studies because they used randomized controlled trials. That helps improve the quality of your data, alright, but...what about the other 4,526? 99.4% of the research didn't use random trials? The best education can typically do (not faulting education, it's just the nature of the beast) is "quasi-experimental" studies.

4. Another problem with the data that puts a big red flag on all these numbers (again, GIGO): There are no real (scientific) controls in educational research. A control is a "yes/no" situation; Group A gets the experimental treatment (e.g., a drug) and Group B does not (e.g., a placebo). Obviously you can't do this without doing something tantamount to child abuse (i.e., standing there and doing absolutely nothing)...but frighteningly, that's the only meaningful "control" there could be. (And that's one reason why education data will never be scientific in nature.)

5. Barring a strictly regimented routine (one that could probably be automated via presentation software), it's highly unlikely two teachers using the same "technique" will apply it identically. (And the same goes for the "controls" above; what teachers replaces the experimental technique with will differ, rending comparisons dicey) This leads to another "apples and oranges" scenario for meta-analysis (albeit admittedly a relatively weak one).

6. More apples and oranges: One technique may be effective at one grade level but not another. I have no problem accepting that having a learning goal may help first graders. They may need the focal point and they are (I believe...) general subjects/topics. I have a hard time accepting that having written on the board "student is going to factor tri-nomials" is going to have a significant impact on seniors in algebra. (Anecdote: My students have repeatedly mocked/made derisive comments when they see me changing the learning goals. For example: "You know we never look at those, right?" "Yes, I know, it's just something I have to do." Very empowering, let me tell you...) Mushing multiple grades together into one statistic is just a bad idea. Ditto for different subjects (at higher grades).

No, the methodology--which I did read--is the problem. The book--which I admit I didn't read--rests on this pseudo-science. The author and many others (e.g., Marzano) are trying to make very dubious numbers the basis of educators' evaluations--and doing a remarkable job of flummoxing the politicians who make these decisions. To get a better understanding of how they're doing this, I strongly recommend Dr. Willingham's "When Can You Trust the Experts: How to Tell Good Science from Bad in Education".