I did not discriminate beyond those two criteria. However, I am using a gold star ★ to highlight one property that only a few papers have: a generalizable explanation for why the results occurred. You can read more about explanatory hypotheses here.

The purpose of Open Access Vis is to highlight open access papers, materials, and data and to see how many papers are unavailable outside of a paywall. See the about page for more details about reliable open access.

Why?

Most visualization research papers are funded by the public, reviewed and edited by volunteers, and formatted by the authors. So for IEEE to charge $33 for each person who wants to read the paper is… well… (I’ll let you fill in the blank). This paywall is contrary to the supposedly public good of research and the claim that visualization research helps practitioners (who are not on a university campus).

But there’s an up side. IEEE specifically allows authors to post their version of a paper (not the IEEE version with a header and page numbers) to:

The author’s website

The institution’s website (e.g., lab site or university site)

A pre-print repository (which gives it a static URL and avoids “link rot”)

I did not discriminate beyond those two criteria. However, I am using a gold star ★ to highlight one property that only a few papers have: a generalizable explanation for why the results occurred. You can read more about explanatory hypotheses here.

It’s that time of year again. InfoVis abstracts have been submitted, and lots of people are scrambling to finish their full submission.

I was curious about the distribution of keywords in the submissions, so I visualized some of the data available to the program committee (PC). After checking with the chairs, I thought others might be curious about the results.

Note that these are only abstracts, so there will probably be some attrition before the full paper submission deadline. To see a MUCH more thorough analysis of multiple years and venues, check out http://keyvis.org

I did not discriminate beyond those two criteria. However, I am using a gold star ★ to highlight one property that only a few papers have: a generalizable explanation for why the results occurred. You can read more about explanatory hypotheses here.

This year, 40% of InfoVis papers included an empirical evaluation. I made a list in my last post.

There were also a couple papers worth noting that described methods for evaluating visualizations. These papers can help bootstrap future evaluations, leading to a better understanding of when and why vis techniques are effective.

Learning Perceptual Kernels for Visualization Design – Çağatay Demiralp, Michael Bernstein, Jeffrey Heer pdf
A collection of methods are described to find the relative discriminability of feature values (e.g. colors or shapes). It also looks at finding the descriminability of combinations of visual features (e.g. colors and shapes). The paper validates its approach by determining the discriminability of size and showing which of their measures closely match the established Steven’s power law for size.

A Principled Way of Assessing Visualization Literacy – Jeremy Boy, Ronald Rensink, Enrico Bertini, Jean-Daniel Fekete pdf
This paper describes how to use Item Response Theory – a technique common in psychometrics and education literature – to assess a person’s “literacy” or skill with visualizations. I would have liked to have seen the approach validated (or at least compared) with some external factor like the person’s experience with visualization. Understandably, that can be tough to measure, but this method certainly shows promise for explaining individual differences in user performance.

I did not discriminate beyond those two criteria. However, I am using a gold star ★ to highlight one property that only a few papers have: a generalizable explanation for why the results occurred. You can read more about explanatory hypotheses here.

I have never seen this practice in any other field, and I was curious as to the origin.

Half Hypotheses

Although these statements are referred to as ‘hypotheses’, they’re not… at least, not completely. They are predictions. The distinction is subtle but important. Here’s the scientific definition of hypothesis according to The National Academy of Sciences:

A tentative explanation for an observation, phenomenon, or scientific problem that can be tested by further investigation…

The key word here is explanation. A hypothesis is not simply a guess about the result of an experiment. It is a proposed explanation that can predict the outcome of an experiment. A hypothesis has two components: (1) an explanation and (2) a prediction. A prediction simply isn’t useful on its own. If I flip a coin and correctly guess “heads”, it doesn’t tell me anything other than that I made a lucky guess. A hypothesis would be: the coin is unevenly weighted, so it is far more likely to land heads-up. It has an explanation (uneven weighting) that allows for a prediction (frequently landing heads-up).

The Origin of H1, H2, H3…

Besides the unusual use of the term “hypothesis”, where does the numbering style come from? It appears in many IEEE InfoVis and ACM CHI papers going back to at least 1996 (maybe earlier?). However, I’ve never seen it in psychology or social science journals. The best candidate I can think of for the origin of this numbering is a misunderstanding of null hypothesis testing, which can be best explained with an example. Here is a null hypothesis with two alternative hypotheses:

H0: Objects do not affect each other’s motion (null hypothesis)

H1: Objects attract each other, so a ball should fall towards the Earth

H2: Objects repel each other, so a ball should fly away from the Earth

Notice that the hypotheses are mutually exclusive, meaning only one can be true. In contrast, Vis/CHI-style hypotheses are each independent, and all or none of them can be true. I’m not sure how one came to be transformed into the other, but it’s my best guess for the origins.

Unclear

On top of my concerns about diction or utility, referring to statements by number hurts clarity. Repeatedly scrolling back and forth trying to remember “which one was H3 again?” makes reading frustrating and unnecessarily effortful. It’s a bad practice to label variables in code as var1 and var2. Why should it be better to refer to written concepts numerically? Let’s put an end to these numbered half-hypotheses in Vis and CHI.

Do you agree with this perspective and proposed origin? Can you find an example of this H numbering from before 1996? Or in another field?

When reading a paper (vis or otherwise), I tend to read the title and abstract and then jump straight to the methods and results. Besides the claim of utility for a technique or application, I want to understand how the paper supports its claim of improving users’ understanding of the data. So I put together this guide to the papers that ran experiments comparatively measuring user performance.

Less than a quarter

Only 9 out of 38 InfoVis papers (24%) this year comparatively measured user performance. While that number has improved and doesn’t need to be 100%, less than a quarter just seems low.

Possible reasons why more papers don’t evaluate user performance

Limited understanding of experiment design and statistical analysis. How many people doing vis research are familiar with different experiment designs like method of adjustment or forced-choice? How many have run a t-test or a regression?

Evaluation takes time. A paper that doesn’t evaluate user performance can easily scoop a similar paper with a thorough evaluation.

Evaluation takes space. Can a novel technique and an evaluation be effectively presented within 10 pages? Making better use of supplemental material may solve this problem.

Risk of a null result. It’s hard – if possible at all – to truly “fail” in a technique or application submission. But experiments may reveal no statistically significant benefit.

The belief that the benefit of a vis is obvious. We generally have poor awareness of our own attentional limitations, so it’s actually not always clear what about a visualization doesn’t work. Besides being poor at assessing our abilities, it’s also important to know for which tasks a novel visualization is better than traditional methods (e.g. excel and sql queries) vs. when the traditional methods are better.

A poisoned well. If a technique or application has already been published without evaluation, reviewers would scoff at an evaluation that merely confirms what was already assumed. So an evaluation of past work would only be publishable if it contradicts the unevaluated assumptions. It’s risky to put the time into a study if positive results may not be publishable.

I’m curious to hear other people’s thoughts on the issue. Why don’t more papers have user performance evaluations? Should they?

The criticism is largely centered on which colors we used, namely their luminance and contrast. The criticism is based on a misunderstanding or misreading of our paper.

We have two responses:

Target and distractor colors were selected randomly for each trial and fully counterbalanced; every target color was also used as a distractor. Color and/or luminance pop-out, and discriminability differences between targets and distractors do not explain the results. Rather, grouping modulates search efficiency: Here is a demo.

Color and luminance contrast explanations do not explain our results for motion. Here is a demo.