How Yelp tested their data quality to compare with competition: they pulled a sampling of businesses from “Best Of” lists, manually went through 1000 of them on Yelp and competitor sites, and assigned points for correctness.

Crossing over from data quality to search quality, start with this long read on how search works. Comes with a beautiful illustration of a production data indexing pipeline as a Rube Goldberg machine. Has a section on search quality that starts with a call to define what quality means to you, then lists classical metrics like precision, recall, F1 score, and types of human evaluation.

Also by Netflix, an attempt to tackle data quality: Netflix sources subtitles in >20 languages for its original English language content. As a first step in taking control of translation quality, they launched a test for the subtitle professionals called HERMES, and tag their work with an individual H-number. The stated goal is to be able to recommend translators for specific work by genre. Amusingly, there is a long thread of comments about invalid H-number errors… more testing needed?

If you received this email directly then you’re already signed up, thanks! Else
if this newsletter issue was forwarded to you and you’d like to get one weekly,
then you can subscribe at http://testersdigest.mehras.net