February 28, 2012

When Web scraping gets creepy

Nice conceit: Web scraping tools often make small mistakes, such as including metadata tags or trimming off some needed content. "[I]t feels wrong. It just doesn’t look exactly like what you consider content as a human."

It’s a bit like drawing hands or faces – unless you get it within 5% of perfection it just looks wrong. You’re almost better off drawing it at 80% within perfection and calling it a cartoon.

The uncanny valley of article extraction!

The closer you are to perfection, the less subconscious clues users will get to pick out the content themselves and the more jarring the difference between what they expect and what they get.

This is fairly arcane stuff for most users. But it's nice to see the creepy digital extend its awesome pseudopods into another front.