stats for experimentalists: ANOVA vs multiple t-tests

I'm trying to decide how I should analyze a set of data. The data is Young's modulus vs crosslinker content for a PEG-based hydrogel.

If I do a 1-way ANOVA (or the nonparameteric version, Kruskal-Wallis) I get p < 0.0001, indicating that I can reject the null hypothesis for the ANOVA (the null hypothesis being 'all data is sampled from the same population').

However, if I follow-up with multiple comparison post-test like Tukey's or Neuman-Keuls, I get several pairs that are not significant i.e. 1% vs 2% crosslinker, 2 vs 3% crosslinker, even 1 vs 3% crosslinker. BUT, if I do an unpaired Student's t-test on 1 vs 2%, for example, I get p < 0.001.

I started with ANOVA with Neuman-Keuls post-test because that's what my PI suggested, but neither of us are statisticians and after some reading I'm thinking I can safely do multiple unpaired t-tests instead of a one-way ANOVA. I think this is OK and won't necessarily increase the odds of a 'Type I' error because we know a priori that there's a relationship between crosslinker content and Young's modulus i.e. the latter increases with the former. I came to this conclusion after reading this discussion: http://www.talkstats.com/showthread.php ... nt-results

I should note that the SD's for each group are not equal, so ANOVA is probably not suited for that reason alone. However, I am fairly certain the data within each group is normally distributed. Also, I only have N of 4-6 per group, which is far from ideal, but the standard deviations are small and we know from experience that the results are reproducible.

So what do the stat geeks think? Are the ANOVA results accurate and true or am I safe doing unpaired t-tests of each subsequent pair i.e. 1 vs 2, then 2 vs 3, then 3 vs 4, and so on and so forth?

You can perform the Kolmogorov-Smirnov test on your groups' data individually, which will tell you whether or not the data is normally distributed, but sometimes, as ANOVA is regarded as a "powerful" statistic, as well as one with a decent amount of flexibility, it's the go-to option for analyses which are usually regarded as non-parametric anyway.

As for your post-hoc test dilemma, strictly speaking, the Tukey test is a suitable post-hoc for ANOVA, and e.g. SPSS/PSAW will do it for you. However, IIRC it doesn't really give you a statistic to quote, and for that reason, multiple paired t-tests are often used.

However, if it was me, and I had to do more than 3 or 4 post-hoc tests, I would correct their significance levels using the Bonferroni method, which will almost certainly knock out most of your significance unless your effects are very powerful - the Bonferroni post-hoc correction is [1/no. of paired tests]= new required significance value, so you can see how makes things considerably more stringent right away. Just for laughs, though, you can also leave yourself open to cricitism that using the Bonferroni method makes you more susceptible to a Type 1 error - failing to find significance which was actually present.

Also a good idea to look up similar experiments in the field and simply copy the methods they used.

The problem with using literature examples is that they are often wrong (especially in biology related research). I'm keen on understanding the theory a little better so I can make my own determination as to what is the correct tool for a given situation.

Anyhow, it doesn't matter what type of post-test I do, I always get some pairs that are not considered significantly different. Some tests more than others.

The thing is that I know these sample groups ARE different and since ANOVA is telling me otherwise, it makes me think ANOVA is the wrong tool for this.

From my (admittedly vague) understanding of ANOVA, it is used to measure an effect, i.e. given X different populations, what effect does drug A have on blood level B? ANOVA tests the null hypothesis 'are these groups all the same?' and if they aren't, a post-test will reveal whether any 2 group pairings are significantly different.

But I'm not measuring an effect... (though you could phrase it that way if you wanted). I made gels of different compositions and measured an intrinsic property of the gels. So I already have an idea (from handling them, and also from polymer theory) that the more crosslinker there is the stiffer the gels are. So should ANOVA ever be applied in a situation like this?

You can perform the Kolmogorov-Smirnov test on your groups' data individually, which will tell you whether or not the data is normally distributed

Forgot to address this: I've tested the normality of the data from one of the groups by sampling a normal distribution of the same SD and mean and plotting them together to see if they fall on X=Y, which they do. I'm assuming that the data from each other group is normally distributed as well.

My own concept of statistical analysis is that the outcome is a measure of how reliable, or how randomly an effect occurs - so the "effect" is simply this as it applies to whatever property it is of whatever you're measuring.

So, e.g. even though I know that as a psychologist, exposing experimental subjects to two different kinds of stimuli will e.g. evoke different brain activity, it's a measure of my methods as to how strongly and reliably I can make this difference occur - I would assume this also applies to your solutions, otherwise you wouldn't need to bother with the stats in the first place - no-one would e.g. dispute that sodium hypochlorite is different from hydrofluoric acid.

As far as corrections for multiple comparisons go, you might also consider looking into the Benjamini-Hochberg correction http://en.wikipedia.org/wiki/False_discovery_rate. Where the Bonferroni correctes for familywise error rates and is extremely conservative, the BH method controls for what's called the "false discovery rate." It's more susceptible to Type I errors, but uniformly more powerful. It's implemented in R (and SAS, iirc; I don't recall which has it built in).

With the ANOVA, parametric or not, you're also regressing Young's Modulus on crosslinker content. A significant regression indicates that the Modulus increases (depending on the sign of the slope) with increasing crosslinker, with the amount of crosslinker predicting the Modulus. It might be interesting to also test quadratic and cubic effects to see whether it's a straight-line relationship or curved. In parametric ANOVA, you could overlay a significance interval on the straight (or curved) line throughout the regression.

Looked at this way, it doesn't really matter whether or not the Modulus is significantly different for crosslinker values of 1 versus 2. You've already predicted that the Modulus will increase as cross linker goes from 1 to 2.

The unequal variances might be equalized with appropriate transformation of the data, such as logs or squares, etc. You can test for equal variances. Parametric ANOVA opens up a much wider range of possibilities than non-parametric.

Thanks guys. I'll have to do some reading to digest some of what you've said, specially shread, but this is helpful. I've never taken a formal stats course and I've noticed that most scientists don't know how to do the stats properly and I'd like to not be one of those scientists

shread: We don't have a hypothesis really -- these are novel hydrogels and we characterized their mechanical properties and demonstrated the possible range of Young's moduli as a function of crosslinker content. We had no way a priori of knowing specifically what the moduli would be, just that it would go up with crosslinker content. FWIW, my advisor DID say that it's 'not uncommon' for an ANOVA post-test to not find significance in some pairs, I just find it theoretically unsettling or confusing, and am trying to understand it better, I guess?

Also, this statistical analysis is in no way crucial to the paper -- it's more a formality than anything -- but as I've said, I'm using it as an opportunity to more some more advanced stats.

shread: We don't have a hypothesis really -- these are novel hydrogels and we characterized their mechanical properties and demonstrated the possible range of Young's moduli as a function of crosslinker content. We had no way a priori of knowing specifically what the moduli would be, just that it would go up with crosslinker content. FWIW, my advisor DID say that it's 'not uncommon' for an ANOVA post-test to not find significance in some pairs, I just find it theoretically unsettling or confusing, and am trying to understand it better, I guess?

As the difference between the crosslinker content of adjacent points decreases, the difference between Young's Modulus will decrease, approaching zero as the crosslinker content difference approaches zero. For t-tests, the significance of the Young's Modulus difference between two points is based on that difference divided by the standard error. Assuming the standard error stays constant, at some point, the difference becomes too small to be significant, statistically. This would be a least significant difference.

A difference between two mean values divided by a standard error is the fundamental relationship for t-tests. The square of it is the fundamental relationship for F-tests.

The measurements of two points are not fixed however, but rather vary. So, even though you have the same crosslinker content, the Young's Modulus values for two separate runs will vary. For replicate runs at two crosslinker contents, the difference also will vary. Near the least significant difference, sometimes it will be significant, sometimes not. This effect will go away as sample size increases and the observed means converge on their true values.

My comments about linear and quadratic trends related to the regression of Modulus on crosslinker content. The Modulus might increase in direct proportion to crosslinker content or it might increase as the square or square root of crosslinker content, or some other relationship. The linear effect tells you it has increased, for orthogonal coefficients. A quadratic one would tell you it curves, with the sign giving the direction of curvature. A cubic effect would have two curves.

If the relationship is not strictly linear, but curves upward, then differences in Modulus at two small crosslinker contents might not be significant, while two large crosslinker contents might give a significant differences, even though the difference in crosslinker content between the two sets of points is the same. All this is assuming constant variance (standard error), of course.

Still digesting, but, is it accurate to say that I could most likely find a significant difference between all groups with an ANOVA post-test if I increase my number of samples? We do have a rather small number.

Still digesting, but, is it accurate to say that I could most likely find a significant difference between all groups with an ANOVA post-test if I increase my number of samples? We do have a rather small number.

It depends on the variability and degree of separation. You said you had significance overall on the ANOVA. Everything else is predicated on that.

In my first post, I was trying to tell you that the multiple range tests were not really applicable in your particular case, and have been trying to guide you into a minimalist analysis of the functional relationship between crosslinker and modulus, which might give some insights back into the chemistry or into engineering applications, such as optimal amounts of crosslinker.

Still digesting, but, is it accurate to say that I could most likely find a significant difference between all groups with an ANOVA post-test if I increase my number of samples? We do have a rather small number.

What was your sample size, and how many variables were associated with each sample ?

In general, what you're saying is true, as more samples tends to increase the sheer amount of variation, but in practise with an ANOVA, to artificially increase the variance to the point where it biases results can require upwards of hundreds or thousands of cases. As with everything else, it's usually related to what it is you're measuring, and how you're measuring it.

Still digesting, but, is it accurate to say that I could most likely find a significant difference between all groups with an ANOVA post-test if I increase my number of samples? We do have a rather small number.

What was your sample size, and how many variables were associated with each sample ?

In general, what you're saying is true, as more samples tends to increase the sheer amount of variation, but in practise with an ANOVA, to artificially increase the variance to the point where it biases results can require upwards of hundreds or thousands of cases. As with everything else, it's usually related to what it is you're measuring, and how you're measuring it.

Chris G, I believe you're confusing the purpose of a multiple range test with a sheer increase in error degrees of freedom, which would increase precision, not decrease it.

No, I mean that any variation at all is easier to detect as sample sizes increase, but that in order to have a significant (in a non-statistical sense) impact upon the effects in the data, you'd need to increase the number of samples dramatically.

But either way, I would be interested in hearing what DyDx's sample size is. It equally might be inadequately small, leading to spurious results that way.

No, I mean that any variation at all is easier to detect as sample sizes increase, but that in order to have a significant (in a non-statistical sense) impact upon the effects in the data, you'd need to increase the number of samples dramatically.

But either way, I would be interested in hearing what DyDx's sample size is. It equally might be inadequately small, leading to spurious results that way.

OK, you meant treatment variance not error variance! The SE scales as the sq root of the inverse of sample size. When you go from 2 to 4 you get a big gain, but need to go to 8 to get as big a gain yet again.

DyDx mentioned sample sizes of 4-6 per group, which would be adequate, assuming each "group" is a set of reactions run at a single level of crosslinker. He didn't mention how many different groups or concentrations of crosslinker were involved. The blocking structure also hasn't been clarified.

He mentioned the SEs are not equal, but I'm not sure what is his basis for that statement. I first took it that he did some test for normality or equality of variances, but now suspect not, and that his statement is based on looking at the variation in the SEs around each point.

Clearly this is a straightforward problem in regression, BTW, and the various t-tests and multiple-range tests are inappropriate. Draper and Smith, "Applied Regression Analysis," could be your friend here.

Possibly multiple regression, although it would depend again pretty heavily on the properties of what he's testing, and whether the groups he'd need share properties or are extremely different from each other.

Possibly multiple regression, although it would depend again pretty heavily on the properties of what he's testing, and whether the groups he'd need share properties or are extremely different from each other.

The above is incorrect. This is simple regression, not multiple regression since there is only one independent variable, crosslinker concentration. There also is only one dependent variable. Dydx has provided sufficient details to provide an outline of how to analyze this experiment, which is by trend analysis, which I have been mentioning since my opening post. It would be appropriate to use the method of orthogonal polynomial contrasts. Such analyses are discussed in detail at http://www.google.com/url?sa=t&rct=j&q= ... rsfcJC9cpA

A specific example of trend analysis starts on page 4.9 of that document. That section opens:

Quote:

Experiments are often designed to characterize the effect of increasing levels of a factor (e.g. increments of a fertilizer, planting dates, doses of a chemical, concentrations of a feed additive, etc.) on some response variable (e.g. yield, disease severity, growth, etc.). In these situations, the experimenter is interested in the dose response relationship. Such an analysis is concerned with overall trends and not with pairwise comparisons.

Please note that the analysis is not concerned with pairwise comparisons, as I have said repeatedly.

The link above refers to Little and Hills, "Agricultural Experimentation, Design and Analysis," Wiley, NY. The issue I have is dated 1978. The book is even bound like a cookbook, and doesn't demand any particular math skills, but is very thorough.

Be very careful of falling into the trap of thinking that a p-value > 0.05 means that there isn't a difference -- it just means that you failed to detect the difference, due to poor power/sample size or bad luck. Unless you are going on a fishing exhibition with your data, almost any reasonable null hypothesis is going to be false at some level. Give me a large enough sample size, and I am quite certain that I could prove that the number of times you sneeze on your 20th birthday correlates to your lifespan (with the caveat that said sample size would probably be larger than the actual population).

For that reason, looking at confidence intervals tends to be far more informative, since it can tell you whether the effect that you're looking at is practically significant, not just statistically significant. It will also give you an indication if you may have missed a practically significant effect by having too few samples.

As shread pointed out, orthogonal polynomial contrasts is the way to go here. You can find a linear trend, a quadratic trend, etc. If your problem of non-constant variance is that your variability increases as the crosslinker content increases, you may also want to look at a transformation of your data that would make sense scientifically. For example, if you expect something like Modulus = Constant1 * (Constant2^Crosslinker), a log transformation of the modulus would let you estimate those constants directly while potentially giving you constant variance, and the polynomial contrasts would help indicate if the relationship is more complex.

Well, you can do this with multiple regression, per my previous link, near the end.

Quote:

There are equations to calculate coefficient similar to those of Table 15.12 for unequally spaced treatment levels and unequal numbers of replications. The ability to compute such sums of squares using orthogonal contrasts was crucial in the days before computers. But now it is easier to implement a regression approach, which does not require equal spacing between treatment levels [ST&D 388]. The SAS code for a full regression analysis of the soybean yield data:

Yeah, as pointed out, it's the same thing as the polynomial contrasts, in effect (and in calculation), except that it's easier to handle unequal spacing between your inputs. I think statistics education is rather hampered by the pointless distinction between ANOVA and linear regression, when the fundamental difference actually tends to be observational studies (which cannot prove causation, though they can wink-wink-nudge-nudge you in that direction) and experimental studies (which prove causation, except when you make any of the mistakes that almost everybody misses).