I did not discriminate beyond those two criteria. However, I am using a gold star ★ to highlight one property that only a few papers have: a generalizable explanation for why the results occurred. You can read more about explanatory hypotheses here.

A flat trend

27% of InfoVis conference papers measured user performance – a 10% drop from last year. Overall, there has been little change in the past four years.

Here’s the Aggresti-Coull binomial 84% CI, so each proportion can be compared.

The Journal Articles

Although TVCG appears to have taken a big dive, the total number of articles is so low, that it could very well be noise. Note the large error ribbon for TVCG.

In the chart on the right, I collapsed the past four years of data and recomputed the means and CIs. For TVCG, I’m only include papers presented in the InfoVis track. TVCG has more papers with performance evaluation, and it can’t simply be explained by random noise.

Little Generalization

There is still a very low proportion of papers with an explanatory hypothesis that can inform generalizability. I try to be very generous with this assessment, but very few papers attempt to explain why or if the results are applicable outside of the specific conditions of the study. Also, there are still a lot of guesses presented as hypotheses.

Obviously, please let me know if you find a mistake or think I missed something. Also, please hassle any authors who didn’t make their pdf publicly available.

Thanks Michael. Generalizability is definitely a topic worth writing about. I wrote a short discussion of explanatory hypotheses. I’m considering where I could send a more detailed discussion, but since it would focus specifically on objective measures of time and error, a venue that is “beyond time and error” is explicitly not appropriate.

By the way, do you have a link to a public PDF of your visual search paper? I’d like to link to it from this post.

Stay tuned for next Beliv. We’re most probably going to change the title (not the acronym) to be more inclusive to discussions on time and error and other quantitative measures. It’s important that meta discussions cab happen in all areas of evaluation.