POPL 2015 Artifact Evaluation

This year, POPL is initiating a novel experiment: giving authors the opportunity
to submit for evaluation any artifacts that accompany their papers. A similar
experiment ran successfully for ESEC/FSE 2011, ECOOP 2013, OOPSLA 2013, and PLDI
2014 and we want to build on its success. This document describes the goals and
mechanics of this process.

Background

A paper consists of a constellation of artifacts that extend beyond the document
itself: software, mechanized proofs, models, test suites, benchmarks, and so on.
In some cases, the quality of these artifacts is as important as that of the
document itself, yet our conferences offer no formal means to submit and
evaluate anything but the paper. We are creating an Artifact Evaluation
Committee (AEC) to remedy this situation.

Goals

Our goal is two-fold: to both reward and probe. Our primary goal is to reward
authors who take the trouble to create useful artifacts beyond the paper.
Sometimes the software tools that accompany the paper take years to build; in
many such cases, authors who go to this trouble should be rewarded for setting
high standards and creating systems that others in the community can build on.
Conversely, authors sometimes take liberties in describing the status of their
artifacts—claims they would temper if they knew the artifacts are going to
be scrutinized. This leads to more accurate reporting.

Our hope is that eventually, the assessment of a paper’s accompanying
artifacts will guide the decision-making about papers: that is, the
AEC would inform and advise the Program Committee (PC). This would,
however, represent a radical shift in our conference evaluation
processes; we would rather proceed gradually. Thus, in our process,
artifact evaluation is optional, and authors choose to undergo
evaluation only after their paper has been accepted.

Criteria

The evaluation criteria are ultimately simple. A paper sets up certain
expectations of its artifacts based on its content. The AEC will read the paper
and then judge how well the artifact matches these criteria. Thus the AEC’s
decision will be that the artifact does or does not “conform to the
expectations set by the paper”. Ultimately, we expect artifacts to be:

consistent with the paper,

as complete as possible,

documented well, and

easy to reuse, facilitating further research.

Benefits

We believe the dissemination of artifacts benefits our science and engineering
as a whole. Their availability improves reproducibility, and enables authors to
build on top of each others’ work. It can also help more unambiguously resolve
questions about cases not considered by the original authors.

Beyond helping the community as a whole, it confers several direct and indirect
benefits to the authors themselves. The most direct benefit is, of course, the
recognition that the authors accrue. But the very act of creating a bundle that
can be used by the AEC confers several benefits:

The same bundle can be distributed to third-parties.

A reproducible bundle can be used subsequently for later experiments
(e.g., on new parameters).

The bundle simplifies having to re-run the system subsequently when,
say, having to respond to a journal reviewer’s questions.

The bundle is more likely to survive being put in storage between
the departure of one student and the arrival of the next.

However, creating a bundle that meets all these properties can be onerous.
Therefore, the process we describe below does not require an artifact to have
all these properties. It offers a route to evaluation that confers fewer
benefits for vastly less effort.

Process

To maintain a wall of separation between paper review and the artifacts, authors
will only be asked to upload their artifacts after their papers have been
accepted. Of course, they can (and should!) prepare their artifacts well in
advance, and can provide the artifacts to the PC through unofficial URLs
contained in their papers, as many authors already do.

The authors of all accepted papers will be asked whether they intend to have
their artifact evaluated and, if so, to upload the artifact. They are welcome to
indicate that they do not. Since we anticipate small glitches with installation
and use, the AEC reserves the right to send a one-time message to the authors
requesting clarification. Authors can submit a one-time response, focusing
solely on the questions of the AEC; we do not impose a word-limit (since, e.g.,
a code attachment may be needed), but strongly suggest that the prose be no
longer than 1000 words. Based on these inputs, the AEC will complete its
evaluation and notify authors of the outcome.
Authors are welcome to ignore the feedback or to include it in their
paper as they deem fit (as a footnote, a section, etc.).

The PC Chair’s report will include a discussion of the artifact evaluation
process. Papers with artifacts that “meet expectations” may indicate that
they do with the following badge (courtesy Matthias Hauswirth):

Artifact Details

To avoid excluding some papers, the AEC will try to accept any artifact that
authors wish to submit. These can be software, mechanized proofs, test suites,
data sets, and so on. Obviously, the better the artifact is packaged, the more
likely the AEC can actually work with it.

In all cases, the AEC will accept a video of the artifact in use. These may
include screencasts of the software being run on the examples in the paper,
traversals of models using modeling tools, stepping through a proof script, etc.
The video is, of course, not a substitute for the artifact itself, but this
provides an evolutionary path that imposes minimal burden on authors.

Submission of an artifact does not contain tacit permission to make
its content public. AEC members will be instructed that they may not
publicize any part of your artifact during or after completing
evaluation, nor retain any part of it after evaluation. Thus, you are
free to include models, data files, proprietary binaries, etc. in your
artifact. We do strongly encourage that you anonymize any data files
that you submit.

We recognize that some artifacts may attempt to perform malicious
operations by design. These cases should be boldly and explicitly
flagged in detail in the readme so AEC members can take appropriate
precautions before installing and running these artifacts.

AEC Membership

The AEC will consist of about a dozen members. Other than the chairs, we intend
for all other members to be senior graduate students and postdocs, identified
with the help of current, active researchers.

We believe qualified graduate students are often in a much better position than
many researchers to handle the diversity of systems expectations we will
encounter. In addition, these graduate students represent the future of the
community, so involving them in this process early will help push this process
forward.

Naturally, the AEC chairs will devote considerable attention to both mentoring
and monitoring, helping to educate the students on their responsibilities and
privileges.