June 18, 2012

Bucking the “Trend” and Approaching “Approaching Significance”

I believe we are on an irreversible trend toward more freedom and democracy – but that could change.

—Dan Quayle

In general usage, the concept of trend implies movement. Not only is this implied in its definitions, but the word can be traced to its Middle High German root of trendel, which is a disk or spinning top.1

In scientific writing, when is a trend not a trend? When it is not referring to comparisons of findings across an ordered series of categories or across periods of time. However, this and related terms are often misused in manuscripts and articles.

Most studies are constructed as hypothesis testing. Because an individual study only provides a point estimate of the truth, the researchers must determine before conducting the study an acceptable cutoff for the probability that a finding of an association is due to chance (the α value, most commonly but not universally set at .05 in clinical studies). This creates a dichotomous situation in interpreting the result: the study either does or does not meet this criterion. If the criterion is met, the finding is described as “statistically significant”; if it is not met, the finding is described as “not statistically significant.”

There are many limitations to this approach. Where the α level is set is arbitrary; therefore, in general all findings should be expressed as the study’s point estimate and confidence interval, rather than just the study estimate and the P value. Despite the limitations, if a researcher designs a study on the basis of hypothesis testing, it is not appropriate to change the rules after the results are available, and the results should be interpreted accordingly. The entire study design (such as calculation of the sample size and study power – the ability of a study to detect an actual difference or effect, if one truly exists) is dependent on setting the rules in advance and adhering to them.

If a study does not meet the significance criterion (for example, if the α level was set as < .05, and the P value for the finding was .08), authors sometimes describe the findings as “trending toward significance,” “having a trend toward significance,” “approaching significance,” “borderline significant,” or “nearly significant.” None of these terms is correct. Results do not trend toward significant—they either are or are not statistically significant based on the prespecified study assumptions. Similarly, the results do not include any movement and so cannot “approach” significance; and because of the dichotomous definition, “nearly significant” is no more meaningful than “nearly pregnant.”

When a finding does not meet statistical significance, there are generally 2 possible explanations: (1) There is no real association. (2) There might be an association, but the study was underpowered to detect it, usually because there were not enough participants or outcome events. A finding that does not meet statistical significance may still be clinically important and warrant further consideration.

However, when authors use terms such as trend or approaching significance, they are hedging the interpretation. In effect, they are treating the findings as if the association were statistically significant, or as if it might have been if the study had just gone a little differently. This is not justified. (Lang and Secic2 make the fascinating observation that “Curiously, P values never seem to ‘trend’ away from significance.”)

A proper use of the term trend refers to the results of one of the specific statistical tests for trend, the purpose of which is to estimate the likelihood that differences across 3 or more groups move (increase or decrease) in a meaningful direction more than would be expected by chance. For example, if a population of persons is ranked by evenly divided quintiles based on serum cholesterol level (from lowest to highest), and the risk of subsequent myocardial infarction is measured in each group, the researcher may want to determine whether risk increases in a linear way across the groups. Statistical tests that might be used for analyzing trends include the χ2 test for trend and the Cochran-Armitage test.

Similarly, a researcher may want to test for a directional movement in the values of data over time, such as a month-to-month decrease in prescriptions of a medication following publication of an article describing major adverse effects. A number of analytic approaches can be used for this, including time series and other regression models.

Instead of using these terms, the options are:

1. Delete the reported finding if it is not clinically important or a primary outcome. OR

2. Report the finding with its P value. Describe the result as “not statistically significant,” or “a statistically nonsignificant reduction/increase,” and provide the confidence interval so that the reader can judge whether insufficient power is a likely reason for the lack of statistical significance.

If the finding is considered clinically important, authors should discuss why they believe the results did not achieve statistical significance and provide support for this argument (for example, explaining how the study was underpowered). However, this type of discussion is an interpretation of the finding and should take place in the “Discussion” (or “Comment”) section, not in the “Results” section.

Bottom line:

1. The term trend should only be used when reporting the results of statistical tests for trend.

2. Other uses of trend or approaching significance should be removed and replaced with a simple statement of the findings and the phrase not statistically significant (or the equivalent). Confidence intervals, along with point estimates, should be provided whenever possible.—Robert M. Golub, MD