Friday, September 16, 2016

A curmudgeonly read of the ZIKV case control study, Brazil 2016 (Lancet ID)

A ‘definitive’ study examining the relationship between ZIKV
and microcephaly in Brazil has just been published [1].

I need preface this by saying this obviously represents a
massive amount of work by a large number of people under very difficult
circumstances, with many families making massive sacrifices to be involved, and
I am in no way denigrating those efforts. It is also very explicit about being
a preliminary analysis, and is being touted as a definitive causal statement.

BUT IT’S NOT READY
YET.

But, why do I say that? Bias, bias, and more bias.

I. Sample size, power
and preliminary analyses

The authors state: “The original study aimed to include 200
cases and 400 controls to have 90% power, 95% precision to detect an
association with an odds ratio of 2 or greater, assuming that 67% of cases were
exposed.”

Power calculations exist for a very good reason- small
numbers lie to you. Fantastic
discussion [2], and the pdf here:

“However, as small studies are
particularly susceptible to inflated effect size estimates and publication
bias, it is difficult to be confident in the evidence for a large effect if small studies
are the sole source of that evidence.”

This is why protocols get approved and pre-filed. Interim
analyses are dangerous, as small numbers are unstable- I can confidently
predict the final OR will be much, much closer to 1.

II. Biologically
implausible effect sizes

The overall odds ratios (whether 55.5 or 86.5) are simply
entirely biologically implausible. The only comparable OR I've ever seen, and the most-iron clad relationship in epi is mesothelioma and occupational long-term asbestos exposure, with an OR= 50.0 (25.8–96.8) [3]. If you have long-term exposure, you'll get mesothelioma, and essentially no-one else gets mesothelioma.

Looking at the another very strong relationship that everyone is familiar with, we have lung cancer and
smoking, with a RR = 8.96; (95% CI: 6.73–12.11) (RR, since it's pooled in a meta-analysis) [4].

III. Biases in
analysis

The authors analyze using “median unbiased estimator for
binary data in an unconditional logistic regression model” which is also called
‘exact logistic’ to reduce instability due to small (or zero) cell counts. Excellent discussion here: https://www3.nd.edu/~rwilliam/stats3/RareEvents.pdf

However, this exceedingly wide CI, with an upper bound of +∞ suggests a major
problem and a potentially biased estimate, which requires closer examination. There are
newer alternatives, particularity so-called Firth logistic regression.

Rerunning
the published numbers (ignoring matching and covariates) using Stata’s –firthlogit-
gives an OR of 86.5 (95% CI: 4.9 to 1523.4). While still disconcertingly wide,
this CI is acceptable for such sparse data.

IV. Loss of controls

The overall OR of 55.5 (8.6 to +∞) [or Firth: 86.5 (4.9 to 1523.4)] is
based on 62 controls. However, the authors report moderate levels of refusal
(76% agreed, so 20 refused). So what happens if some of those twenty controls that
declined to participate were actually ZIKV (+)?

N of 94: OR= 86.5 (95% CI: 4.9 to
1523.4; p= 0.002)

N
of 114 (5 ZIKV (+) controls):OR=
9.8(95% CI: 3.2 to 29.6; p< 0.001)

N
of 114 (10 ZIKV (+) controls):OR= 4.8(95% CI: 1.9 to 12.3;p= 0.001)

While still all significant, the estimates very rapidly progress from jaw-dropping through interesting to ‘ho-hum’ [5], and statistically
significant does not always mean biologically important.

Is there reason to think those that refused might be
different from those that participated? Yes, I think so- perhaps they lived in outlying
neighborhoods, or have different SES or other characteristics that might have a
direct impact on likelihood of being ZIKV(+).

Other issues.

1. High levels of arboviral coinfection were not included in
analysis- this can, and should have been considered in the regression models,
both as interactions and as covariates. These data are rich enough to support a
more comprehensive analysis.

2. No controls, and 19 (59%) of cases were ZIKV(-)- this is
truly bizarre. I suspect what’s going on here is that ZIKV is not playing nice
in serological tests [6]. Specifically, optical
density (titer) responses for anything are a continuum, which requires a
cut-off to determine sero-positivity, (generally 3SDs above a pool of
sero-naïves).

If this cutoff is ‘wrong’ for ZIKV antobodies then there could be massive bias
in classifying exposure, so the exposures captured might represent only the very
highest levels of viremia where the risk could, indeed be very high. Moreover,
the high levels of co-infections suggest something is interfering with the
serology in an important way.

While not directly applicable to arboviruses, one example (Helicobacter pylori) found large
differences in ORs when using a generic ELISA vs. one tuned for populations-at-risk
[7].

Update 1: I should be clear here, I am not questioning that ZIKV is associated with MC as it clearly is in NE Brazil, but I am not yet convinced it is the sole risk factor, and the magnitude of that association is entirely unsettled.