During the last couple of years Zika virus (ZIKV) infection has been considered an international public health emergency, due to possible but yet uncertain links with newborn microcephaly and Guillain-Barré syndrome (GBS). Concerns about these links originated from case reports and surveillance data suggesting an outbreak of ZIKV in Brazil was followed by outbreaks of GBS and microcephaly (1,2). A fundamental component of the public health response has been the implementation of a system for surveillance of ZIKV disease and its complications (3).

Seminal evidence of a temporal association between these outbreaks came from a study by investigators from the Centro de Pesquisas Gonçalo Moniz (CPGM), in Salvador, Bahia, the largest city in Northeastern Brazil (1). Though other studies also reported temporal associations between these outbreaks (4-6), only CPGM investigators have tested it and have made their data available to the public. They identified cases of acute exanthematous illness (AEI) through a sentinel surveillance system established by the Centers for Information and Epidemiologic Surveillance of Salvador (CIES). AEI surveillance started in April 2015, but medical records were reviewed to identify past cases up to February 15, 2015 (Figure 1). Cases were patients with rash, who did not meet diagnosis criteria for dengue, chikungunya, measles, or rubella, and were attributed to ZIKV because this was the main arbovirus circulating at that time (1,7).

In late May 2015 CIES started surveillance of patients hospitalized with “neurologic manifestations that might be linked to Zika”, including GBS, with retrospective search of cases hospitalized during April-May. Authors provided no explicit reason why was GBS surveillance started, but this may have been triggered by an earlier unreplicated report of an increase in GBS cases during a Chikungunya outbreak in French Polynesia (8).

In October 2015 the CIES established a newborn microcephaly case-report system, following a report of an increase in cases in Pernambuco the previous month. Cases were defined as a head circumference <32 cm in full term and less than the third percentile of the Fenton curve in preterm newborns (1), and were searched retrospectively back to the start of the year. The first case was identified in mid-July, and no cases were identified before that time through the regular Brazilian Live Birth Information System (SINASC) (2). In November 19th 2015 the Brazilian Ministry of Health implemented an ad hoc public health surveillance system for microcephaly (2).

CPGM investigators used standard cross-correlations to identify the lag times (from 0 to 40 weeks) showing the highest correlations between weekly numbers of AEI cases and ensuing weekly numbers of cases of GBS and microcephaly. The strongest positive correlations occurred 5–9 weeks later for GBS and 30–33 weeks later for microcephaly. They argued these findings provided strong support for a positive association of AEI cases to ensuing cases of GBS and microcephaly.

CPGM investigators inferred an outbreak of microcephaly had occurred, based on the shape of the epidemic curve (1). Though the shape of the curve resembled that of a point outbreak, it could be also attributed to the intensity of surveillance efforts, and did not tell whether prevalence in 2015 was higher than expected. The prevalence of suspected microcephaly was 15.6 cases/1,000 newborns. Even in its peak (31.4 cases/1,000 in December 2015) the prevalence was below the expected 32.4 cases/1,000 newborns, inferred from the head size distribution in Brazilian newborns (normal, mean: 34.2 cm, standard deviation 1.2; see online Appendix for details on calculations and analyses) (9). Thus, CIES data provided no evidence of an outbreak of microcephaly in Salvador. An apparent outbreak may have resulted from comparing number of cases from the targeted-reactive CIES case-report system to those from the unspecific-regular SINASC in previous years. Indeed, SINASC traditionally had a low case yield (10), and detected no cases of microcephaly in Salvador during a period of 20 weeks in early 2015 (1), even though 462 cases were expected (see Appendix). Indeed, a low SINASC yield has been previously proposed by Brazilian experts on microcephaly as an explanation of the apparent outbreak (11).

The incidence of GBS in Salvador was 1.74/100,000 population (1). However, this rate pertained almost exclusively to the 33 weeks after the AEI peak on week 19, since only two out of 49 cases occurred in the 12 weeks period before that time (Figure 1). This was at variance with 57% of all AEI cases occurring before the AEI peak. If the rate observed after the AEI peak also applied to the whole year, the incidence would have been 2.74/100,000, well within the worldwide range of 0.6 to 4 per 100,000 (12,13). Thus, the apparent GBS outbreak in 2015 was likely due to increased diagnosis, hospitalization, and case reporting after the start of surveillance of neurologic conditions.

CPGM investigators used time-lagged cross-correlations of moving averages to estimate induction times from ZIKV infection to GBS and microcephaly (1). Such an analysis is based on the assumption that causal links actually existed. Therefore, whatever the findings, they cannot be interpreted as evidence of assumed causal links, without incurring in the logical fallacy of circular reasoning.

It has been long established that cross-correlations of non-stationary time series are highly prone to bias (14,15). Outbreak are non-stationary series because their mean and variance change with time. For instance, the mean number of AEI cases increased with time until week 19 and decreased thereafter. Time series, like the outbreaks in question, may appear correlated, even if one does not causes the other, because they may share time-varying direct or mediated causes that confound their association (14,15). In other words, even if two outbreaks are cross-correlated, the distribution of one outbreak may not probabilistically depend on the distribution of the other, i.e., one outbreak may not independently predict the other outbreak. Also, the number of events in a time series are not independent. For instance, the number of cases of AEI in a given week would be closer to the number in the following week than to the number in more distant weeks. This autocorrelation is due to autocorrelation in the time-varying factors driving the series and must be accounted for during the analysis to get correct estimates of variability (16). Moreover, a time series X could be deemed a cause of another time series Y if and only if predictions for Y differ when the history of X is taken into account and when it is not (17).

Induction times estimated by CPGM investigators were likely biased because they were based on statistical methods that assume constant mean and variance and no autocorrelation, the times corresponding to the highest correlations were automatically selected, and the correlations of moving-averages are more extreme than those of non-smoothed data.

I used a Poisson autoregressive model of first order to account for time “effects” and autocorrelation, and assessed whether the number of AEI cases independently predicted future changes in the number of cases of GBS and microcephaly (see Appendix) (18). I modeled the number of cases of each disease as a function of cubic polynomials splines of time (16,19). Then, I drew cross-correlograms of the observed number of cases, similar to those in the original study (1), and of the residual number of cases from the Poisson models.

Findings from the cross-correlations of the original data were consistent with those in the original study (Figure 2). However, after accounting for temporal fluctuations and autocorrelation there was no discernible pattern of correlation between the number of cases of AEI per week and the number of cases of GBS or microcephaly. Accordingly, the prevalence ratio (PR) for a change of 50 cases of AEI per week, for lag times (i.e., induction times) with the highest cross-correlations, were 1.01 [95% confidence interval (CI): 0.99, 1.03] for microcephaly (at 30 weeks), and 1.03 (95% CI: 0.98, 1.08) for GBS (at 7 weeks).

These findings suggest the apparent temporal correlations between the AEI outbreak and putative outbreaks of microcephaly and GBS (1), were likely a consequence of known biases in cross-correlation functions that do not account for time patterns and autocorrelation (14,15). Assigning a causal role to ZIKV infections cannot be based only on the observation of microcephaly and GBS outbreaks occurring after an AEI outbreak. Indeed, outbreaks could concur by chance or may share time-varying causes (i.e., confounders), such as time-related enhancement of surveillance efforts. They may also appear correlated due systematic errors in the data, such as non-independent errors in case ascertainment, or to the autocorrelated nature of the series. This type of causal inference based solely on temporal coincidence constitutes an example of the logical fallacy post hoc ergo propter (i.e., after this, therefore because of it).

Another important report of a temporal association between ZIKV infection and microcephaly came from investigators from the Colombian National Institute of Health (INS), the US Centers for Diseases Control (CDC), and the Colombian Ministry of Health -MINSALUD (identified collectively as ICM) (6). The report was based on Colombian national surveillance data from August 9th 2015 to November 12th 2016, and included only suspected cases of ZIKV. The latter were defined as patients with fever and rash plus at least one of conjunctivitis, eye redness, itching, arthralgia, or malaise not explained by other diseases, who had been in a place below 2,200 m above sea level (20). Microcephaly was defined as head circumference below the third percentile for gestational age and sex (6).

ICM investigators reported a 4.5-fold increase in the prevalence of microcephaly in 2016 as compared to 2015 (from 2.1 to 9.6/10,000 births), as well as a temporal association between ZIKV infection and microcephaly, with a peak in the latter occurring approximately 24 weeks after the peak in ZIKV infections.

ICM’s findings should be cautiously interpreted for various reasons. According to the Colombian INS protocol for surveillance of microcephaly, there were in average 140 cases of microcephaly per year during 2010–2015 (21). This estimate came from a mandatory registry of health care episodes, managed by MINSALUD. The registry gathers data on 80% of the Colombian population with health insurance, but it is known to capture <55% of all the episodes (22,23). Accounting for coverage, under-reporting, and duration or surveillance, 245 cases should have been expected in 2016. Thus, the prevalence in 2016 would have been 2.01 (95% CI: 1.72, 2.35) times higher than that in previous years. This increase that could be easily explained by intensified surveillance of microcephaly in 2016. Moreover, defining microcephaly as head circumference below the third percentile entailed an expected prevalence of 3% (6). Thus, the observed prevalence in 2016 (9.6/10,000) was 31 times lower than expected. This suggests that, in spite of the new guidelines (21), health care personnel continued reporting mostly cases of extreme microcephaly (head circumference ≤3 standard deviations from the mean) (24), and explains why the prevalence in Colombia was close to that estimated by passive surveillance in the United States (25). In summary, the available data gives no substantial support for the occurrence of an outbreak of microcephaly in Colombia in 2016.

Regarding the temporal association, a time series analysis following the approach described above, showed that the weekly number of ZIKV infections in Colombian pregnant women did not predict the prevalence of microcephaly weeks later (see Appendix). After accounting for autocorrelation and temporal fluctuations, there was no association between the weekly number of cases of ZIKV infections and the number of cases of microcephaly 19 weeks later, the time corresponding to the strongest correlation between these conditions (PR: 0.96; 95% CI: 0.92, 1.00).

Moreover, it is surprising that ICM investigators made no attempt to compare the prevalence of microcephaly in populations living above and below 2,200 m above sea level. Indeed, INS proposed residence below 2,000 m as a surveillance criterion to define ZIKV infections (21), and recognized that ZIKV transmission was unlikely and that over half the Colombian population lives above this altitude. Such a natural experiment should be free of biases attributable to time-varying factors, such as changes in the intensity of surveillance efforts, and could have shed light on the proposed ZIKV-microcephaly link.

Fortunately, the data provided in their supplemental table (6) made possible comparing the prevalence of microcephaly in Departments with some (374,488 births and 390 cases) and those with no population (123,514 births and 86 cases) living at or above 2,000 m (see Appendix). After adjusting for infant mortality and proportion of the population with health insurance in the Department, two likely surrogates of the strength of the surveillance system, the prevalence was only 1.62 (95% CI: 1.25, 2.10) times higher in populations living below than in those living above 2,000 m.

The above estimate may be biased towards the null due to non-differential misclassification resulting from the coarse grouping of Departments. However, a finer three-level stratification yielded similar findings (see Appendix). More important, a bias analysis assuming that sensitivity and specificity of the Department classification was non-differential regarding microcephaly status and varied from 60% to 90%, showed a highest PR of 2.84 (95% CI: 1.98, 4.10) for sensitivity of 80% and specificity of 60%. These findings could be reasonably attributed to a tradition of greater concern and better reporting of arbovirus diseases in low altitude areas, a hypothesis that could have been tested with available long term INS surveillance data on those diseases. If that hypothesis were discarded, one may be compelled to believe maternal ZIKV infection is unlikely to increase in the risk of microcephaly by more than 4 times.

The reports discussed in this commentary (1,6) exemplify the use of surveillance data to formulate causal link hypotheses and to guide health policies in a context of uncertainty. However, they also call attention to the need for caution in both instances. Surveillance data are usually not collected for the purpose of identifying causal links. In consequence, the potential for information, selection, confounding, and confirmation biases is exacerbated when surveillance data is used for this purpose. Also, political pressure to respond to ongoing outbreaks may make the need for public health action so strong that analyses of surveillance data and interpretation of findings could become mere exercises to confirm foregone conclusions. Some of these limitation may be mitigated by careful analyses and interpretation of surveillance data. This would be more feasible if institutions in charge of surveillance actively supported efforts to make data readily available in formats and by means that facilitate independent analyses by all stakeholders.

In summary, CPGM and ICM reports do not support the occurrence of microcephaly and GBS outbreaks in these populations or that the apparent outbreaks were linked to ZIKV infection outbreaks. Nevertheless, CPGM investigators should be highly commended for collecting unique and valuable data, for their transparency in describing how the data were collected, how cases were defined and identified, and, more important, for recognizing that surveillance data that informs public health and patient care decisions affecting millions of individuals should be made publicly available, so that all stakeholders have the opportunity and take responsibility for conducting their own assessments. ICM investigators should be also commended for collecting, processing and publishing some of their data, though more detailed data, and more information regarding data sources, and analytical decisions would have strengthened their report. Unfortunately, similar reports based on surveillance data have been notoriously lacking on these regards (2,4).

Supplementary

This appendix provides additional details on calculations and analyses reported in the commentary and Stata code to replicate all results. In addition, an example of a nonsensical correlation is presented, to illustrate that spurious cross-correlations could be observed in unrelated time series and that the interpretation of those correlations depends on what we know and what we assume about the mechanisms generating both series.

Calculating the expected prevalence of microcephaly in Salvador, Bahia

This calculation is based on the distribution of head size among Brazilian newborn. According to the InterGrowth-21st study (9) the mean head circumference in Brazilian children is 34.2 cm, with a standard deviation of 1.2. The probability of having a head circumference <32 cm can be calculated from this distribution using a z test (see code for this calculation in item 1.0 of the accompanying Stata do file). This results in an expected prevalence of 32.8/10,000. This estimate only applies to full term newborns. However, 11.7% of Brazilian newborns are preterm (26). In the latter group the microcephaly was defined as a head circumference <3rd percentile of the Fenton curves (1), which corresponds to a prevalence of 30.0 cases/1,000 preterm newborns. Therefore, the expected prevalence in all newborns is a weighted average of the prevalences in full term and preterm newborns: [(1-0.117)×32.8]+(0.117×30)=32.4 cases/1,000 newborns.

Estimating the number of births and cases of microcephaly in Salvador in early 2015

No cases of microcephaly were found in Salvador before July 11th, 2015 (epidemiological week 27) and case search reached back to epidemiological week 7 (see Figure S1 in original article) (1). Thus, no cases were found in a period of 20 weeks early in 2015. A total of 367 cases of microcephaly were found in the period of 33 weeks from July 2015 to February 2016. The prevalence during this period was 15.6/1,000 newborns. Therefore, the number of births in the period was 367/(15.6/1,000)=23,526. For a period of 20 weeks the number of births would be 23,526×(20/33)=14,258. Assuming the prevalence of 32.4/1,000 estimated above applies to the whole year, the number of expected cases of microcephaly would be 14258 × (32.4/1,000)=462.

Figure S1 Cross-correlograms of weekly observed and residual number of cases of E coli infections in Japan (2017), and microcephaly and Guillain-Barré syndrome (GBS), in Salvador, Brazil (2016).

Estimating the incidence of Guillain-Barré syndrome (GBS) in Salvador in 2015

Paploski et al. (1) reported an incidence of GBS of 1.74 cases/100,000. This figure corresponds mostly to a period of 33 weeks, since only 2 cases of GBS were detected in the 12 weeks before the peak in AEI cases. However, 57% of all cases of AEI occurred during the period when only 2 cases of occurred, and cases were detected during this period by retrospective review of medical records, instead of prospectively reported. If one assumes that the incidence during the 33 weeks after the AEI peak applied to the whole year, the expected incidence in 2015 would have been 1.74×(52/33)=2.74/100,000. This incidence corresponds to an extreme scenario where the outbreak of Zika infections lasts the whole year and ika virus infection is actually associated with incidence of GBS.

Evaluating the temporal association between outbreaks in Brazil

Code for this analysis is provide in steps 3-6 of the Stata do file. Step 3 involves the creation of a dataset including the original data from Salvador, Bahia (3), available at http://wwwnc.cdc.gov/eid/article/22/8/16-0496-t1. It also includes the weekly number of cases of Entero-hemorrhagic Escherichia coli infection in Japan in 2016. The latter data is used for an example of a nonsensical correlation between two time series and were obtained from the Japanese National Institute of Infectious Diseases available at http://www.niid.go.jp/niid/en/survaillance-data-table-english.html?start=14. Epidemiologic curves are drawn in step 4. Dates of key events were added to the graph in the editorial using the Stata graph editor. Cross-correlations for the original number of cases and for residual number of cases are estimated and plotted in step 5. Code for autoregressive Poisson models to estimate the effect of the lagged number of cases of AEI on microcephaly and GBS is provided in step 6.

Figure S1 illustrate a nonsensical correlation between the weekly number of cases of E Coli infections in Japan in 2017 and the number of cases of microcephaly and GBS in Salvador in 2016. Were one unaware that data came from different times and places, upon looking at the cross-correlogram of the observed number of cases one may mistakenly conclude that cases of microcephaly increased significantly after a lag of 15 weeks after an increase in the cases of E coli (upper left panel). However, an analysis of the residual number of cases, after adjustment for time fluctuations and autocorrelation shows a different picture. Indeed, the cross-correlogram of the residual numbers of cases shows, as expected, that correlations between the two outbreaks (series) center around zero and that departures from a zero-correlation happens at random.

Comparing the observed and background cases of microcephaly in Colombia

Official Colombian data on the number of births during 2010–2015 and 2016 were obtained From: https://www.dane.gov.co/index.php/estadisticas-por-tema/salud/nacimientos-y-defunciones/nacimientos. An official report from the Colombian INS indicates that the average number of cases of microcephaly in 2010–2015 was 140/year. After accounting for coverage, under-reporting, and duration or surveillance, the expected number of cases in 2016 was (140/0.55/0.80)×(40/52)=245 cases. The average number of births during the same period, in proportion to the number of weeks of surveillance in 2016 (40/52) was 510,926. The number of cases during the 40 weeks of surveillance in 2016 was 476 (6). The corresponding number of births was (641,493×40/52)=493,456. These figures were used to estimate the prevalence ratio. Instructions for this calculation are provided in item 7 of the Stata code below.

Epidemiologic curves were drawn for a visual check of the accuracy of the process of recovering the number of cases from Figure S1 of Cuevas et al. (6). Data were extracted from Figure S1 using Engauge Digitizer 8.3 (27). The resulting Excel files were then converted to Stata data files for the purpose of the analysis. Figure S2 shows that the time patterns in the number of cases were similar to those in Figure S1 of Cuevas et al. (6). Departures from the original data were most likely random. See item 9 in the Stata do file. Excel data files are available from the author upon request.

Evaluating the temporal association between outbreaks in Colombia

Code for the cross-correlation and the autoregressive Poisson analyses are provided in item 10.1 and 10.2 of the Stata do file, respectively. Figure S3 shows the findings from the cross-correlation analysis. In spite of the apparent high correlation of maternal Zika virus infections and microcephaly 19 weeks later (left panel), there was no discernible pattern in the correlation of the residual numbers of cases once time fluctuations and autocorrelation were accounted for (right panel). This indicates that maternal Zika virus infection did not probabilistically predict microcephaly.

Figure S3 Cross-correlograms of weekly observed and residual number of cases of Zika virus infections and newborn microcephaly in Colombia (2015–2016).

Evaluating the impact of altitude on the prevalence of microcephaly in Colombia

See Stata code items 12.1 to 12.7. Excel data files available from author upon request. A Google maps key is required to extract the altitude of each Department. For details on how to get a key, go to: https://developers.google.com/maps/documentation/javascript/get-api-key. Altitude data are obtained through the Stata module “goelevation”, which was kindly modified by its developer for the use of a Google maps key (28).

Findings from an analysis comparing Departments with all population leaving below and those with some population leaving above 2,000 m above sea level indicated a relative increase of 1.62 (95% CI: 1.25, 2.10) in the prevalence of microcephaly in the former group. Finding from an analysis with Departments grouped in those with 0%, >0 to <60%, and ≥60% of the population leaving above 2,000 m were consistent with those from the previous analysis. The prevalence was increased 1.68 (95% CI: 1.28, 2.20) times and 1.51 (1.12, 2.04) times in the population from Departments with >0 to <60% and ≥60% of the population living below 2,000 m, respectively. These groups were combined, since the prevalence ratios were very similar.

The classification of Departments by altitude in this analysis is based on the proportion of the population living above or below 2,000 m. Therefore, it is possible that some Departments were misclassified. However, the misclassification should be non-differential, because the proportion of the population living at high altitude is not a consequence of the prevalence of microcephaly in the Department. To account for misclassification of the exposure, I used maximum likelihood logistic regression with exposure predictive value weighting, as proposed by Lyles and Lin (29). I used values of sensitivity and specificity of the exposure classification ranging from 60% to 90%. Lower values (closer to 50%) would indicate a random assignment of exposure, while higher values (closer to 100%) would indicate a perfect classification. The last two scenarios are unlikely, because individuals move freely between low and high altitude areas, and there is evidence that transmission of ZIKV could occur in areas of high elevation, depending on latitude, climate, and connectivity of areas (30). Figure S4 shows that correcting for non-differential misclassification of Department would increase the prevalence ratio, but in average the increase would be less than three-fold.

Figure S4 Odds ratios for altitude and microcephaly corrected for changing levels of misclassification of the exposure. Colombia (2015–2016).

Acknowledgements

None.

Footnote

Conflicts of Interest: The author has no conflict of interest to declare.