POPL 2015 Artifact Evaluation

It is our pleasure to report on the artifact evaluation process that we ran on
the behalf of the Program Chair, David Walker. POPL’15 is the first in the long-
running POPL conference series to have an Artifact Evaluation Committee (AEC).

Artifact evaluation is concerned with the byproducts of science. An “artifact”
is something intended to support the scientific claims made in a paper. For
instance, an artifact might be a program’s source code, a data collection, a
test suite, a proof, or a model. “Evaluation” is a best-effort attempt to
reconcile a paper’s artifacts with the claims made in the paper. A primary goal
of the artifact evaluation process is to encourage authors to create artifacts
that can be shared and used by others as bases for new activities. The process
seeks other benefits as well. These include encouraging authors to be precise in
their claims and publicly recognizing efforts toward making high-quality
artifacts.

Three months before the POPL’15 paper submission deadline, the AEC chairs
invited members of the POPL community to nominate PhD students and postdocs to
serve on the AEC. The chairs selected 21 of these, based on their levels of
experience and areas of expertise.

After the decisions for POPL’15 submissions were distributed, the authors of
accepted papers were invited to submit artifacts for evaluation. (Thus, by
design, the artifact evaluation process had no effect on which papers were
chosen to appear at POPL.) Authors had one week, until October 6, to respond
to the call for artifacts. The submission guidelines asked that each artifact be
packaged so as to make evaluation as easy as possible. Typically, this involved
the creation of a virtual machine image, but other means were also accepted.
Each artifact was accompanied by the accepted version of its associated paper so
that the AEC could evaluate each artifact against its paper’s claims. A total of
29 artifacts were submitted for evaluation.

The AEC had almost three weeks, until October 26, to render judgments. The
AEC expected artifacts to be “consistent with the paper; as complete as
possible; documented well; and easy to reuse, facilitating further research.”
The AEC members bid on artifacts and the chairs selected two reviewers for each
one. Artifact evaluation had two phases. During the first, “installation phase”,
the committee simply tried to download, build, and launch the artifacts. The
committee reported any errors that occurred in an initial review and authors had
an opportunity to reply with solutions. During the second phase, the AEC tried
to repeat some or all of the experiments described in the artifact’s paper. AEC
members were cognizant that it would be difficult to reproduce certain results,
e.g., benchmarks that were run on high-performance machines. For four artifacts,
the AEC purchased Amazon EC2 servers to recreate experimental conditions at a
total cost of $310.10. Other artifacts ran successfully on the committee’s
personal computers. After all reviews were submitted, the AEC held an intense
online discussion to decide, for each artifact, if it met, exceeded, or fell
below the expectations set by its paper.

Of the 29 submitted artifacts, 27 were judged to meet or exceed
expectations. The papers that describe these artifacts can be recognized by the
AEC badge they bear (created by Matthias Hauswirth).

We thank the authors of the 29 submitted artifacts for their work in preparing
and documenting their research output. We hope that they found the feedback from
the AEC to be helpful. We also thank David Walker for his support, and the
members of the AEC for their energy and enthusiasm.