MOS (mean opinion score) subjective quality studies are used to
evaluate many signal processing methods. Since laboratory quality
studies are time consuming and expensive, researchers often run
small studies with less statistical significance or use objective measures
which only approximate human perception. We propose a
cost-effective and convenient measure called crowdMOS, obtained
by having internet users participate in a MOS-like listening study.
Workers listen and rate sentences at their leisure, using their own
hardware, in an environment of their choice. Since these individuals
cannot be supervised, we propose methods for detecting and discarding
inaccurate scores. To automate crowdMOS testing, we offer
a set of freely distributable, open-source tools for Amazon Mechanical
Turk, a platform designed to facilitate crowdsourcing. These
tools implement the MOS testing methodology described in this paper,
providing researchers with a user-friendly means of performing
subjective quality evaluations without the overhead associated with
laboratory studies. Finally, we demonstrate the use of crowdMOS
using data from the Blizzard text-to-speech competition, showing
that it delivers accurate and repeatable results.