Coronavirus New York state update: switching from Johns Hopkins to NYT data

April 19th, 2020, 9:35pm by Sam Wang

Over the last few days, our doubling-time tracker has showed steady progress toward longer times in nearly every state – except New York. We think we’ve identified a source of inaccuracy: uneven updating at the Johns Hopkins site. They’re excellent, but their data isn’t intended for visual display. So we’re switching to the New York Times feed.

Here’s our revised graph based on NYT data:
Further thoughts after the jump.

The Hopkins data is itself just a collection scraped from many sources. On the other hand, the NYT is doing more traditional on-the-ground reporting of cases and tallying the data in-house. In addition, the NYT is making its own graphs, which means that as they apply their editorial judgement on what to count, they can see anomalies in the data as they develop.

Most states aren’t affected by this switch – with the prominent exception of New York.

The Johns Hopkins graph:
The Hopkins data shows a rapid increase in deaths late last week that contradicted most other reports. They appear to have since revised that number downward. My best guess is that it was a consequence of the decision to start including probable covid-19-related death counts (as opposed to only patients who tested positive), since there was some reporting on that earlier in the week. Nevertheless, it does seem the Times has more people attending to their data pipeline, and probably more attention to the accuracy of USA-specific data. That’s not to say we don’t trust the JHU data, it just may take them longer to correct errors. Since the NYT is currently showing more deaths, that may be a better measure to capture everything that’s happening.

17 Comments so far ↓

Thanks for the doubling figures, which are both interesting and helpful. Do you worry about the lag in number of deaths in the New York Times website, compared to other sources? So at 1:12 PM on Tuesday April 21, NYT has 37,818 deaths in the US, while USA Today reports Johns Hopkins site shows over 42,000 deaths nationwide (at least 11% more).

I know it is good to use a single (aggregate) source of data, but deaths are already a lagging indicator, and the NYT has a substantial lag on top of that. How would this affect doubling times? Thanks again

I noted that the doubling time for deaths in the US on April 22nd jumped to 24.6 days (compared to 8.8 days the previous week). This is both encouraging and expected when we approach a peak in the daily death rates. As we go past the peak should we expect, within data variations, an essentially infinite doubling time?

Yes. States reported differently depending on the DOW. For example, many don’t report on Sunday. Data for Saturdays, Sundays, and Mondays can often show some kind of weekend effect that look like pre and post weekend oscillations in bar charts.

Waves of aggregated state data can then combine to show weekly waves in regional and national data.

I’m in West Virginia and the 105 days time to double is certainly reassuring. I’m near Charleston and from what I see personally most people are taking it seriously around here, even the die-hard Trump supporters. I think the social distancing combined with our geography and low population has helped produce this number. I just hope we continue to improve and the rest of the world as well.