I did not discriminate beyond those two criteria. However, I am using a gold star ★ to highlight one property that only a few papers have: a generalizable explanation for why the results occurred. You can read more about explanatory hypotheses here.

No Clear Change

37% of InfoVis conference papers measured user performance. Last year, I thought there was a big change, but the variance is indistinguishable from noise.

Here’s the Aggresti-Coull binomial 84% CI, so each proportion can be compared.

The Journal Articles

It’s great that the distinction between conference papers and journal papers is fading away at least in terms of the conference program. I’m only maintaining it for historical comparison.

In the chart on the right, I collapsed the past three years of data and recomputed the means and CIs. For TVCG, I’m only include papers presented under the InfoVis track. TVCG has more papers with performance evaluation, and it can’t simply be explained by random noise. I don’t know if the difference is caused by where papers are submitted, the different review process, different reviewers, or rolling submissions being more conducive to running a study. But more of the journal papers test their claims.

Honorable Mentions

In the last couple years, not a single best paper or honorable mention attempted to experimentally validate their claims. This year changed that trend with awards going to papers that measured (or reanalyzed) user performance.

Little Generalization

There is still a very low proportion of papers with an explanatory hypothesis that can inform generalizability. I try to be very generous with this assessment, but very few papers attempt to explain why or if the results are applicable outside of the specific conditions of the study. Also, there are still a lot of guesses presented as hypotheses.