New data analysis competitions

Outbrain pairs relevant content with curious readers in about 250 billion personalized recommendations every month across many thousands of sites. In this competition, Kagglers are challenged to predict which pieces of content its global base of users are likely to click on.

$12,000 for the top place.

Privacy

Yahoo Inc last year secretly built a custom software program to search all of its customers' incoming emails for specific information provided by U.S. intelligence officials, according to people familiar with the matter.

The company complied with a classified U.S. government directive, scanning hundreds of millions of Yahoo Mail accounts at the behest of the National Security Agency or FBI, said two former employees and a third person apprised of the events.

The reason that Predpol predicted all the crime would occur in a poor black neighborhood in Oakland is that Oakland's notoriously racist police force concentrates its policing there, and you can only find crime in places where you look for it. Predpol and tools like it are sold as data-driven ways to overcome this kind of police bias, but really, they're just ways of giving bias a veneer of objective responsibility.

For those of us who make a living solving problems, the current deluge of big data might seem like a wonderland. Data scientists and programmers can now draw on reams of human data—and apply them—in ways that would have been unthinkable only a decade ago.

But amid all the excitement, we're beginning to see hints that our nice, tidy algorithms and predictive models might be prone to the same shortcomings that the humans who create them are. Take, for example, the revelation that Google disproportionately served ads for high-paying jobs to men rather than women. And there's the troubling recent discovery that a criminal risk assessment score disproportionately flagged many African Americans as higher risk, sometimes resulting in longer prison sentences.

But she [Michèle Nuijten, a PhD student at Tilburg University in the Netherlands] and some colleagues in the Netherlands were curious enough to check. They built a computer program that could quickly scan published psychological papers and check the math on the statistics. They called their program "Statcheck" and ran it on 30,717 papers.

Rounding errors, and other small potential mistakes in calculating the statistics, were rampant.