Friday, November 18, 2016

Facebook recently claimed it is hard[1] to differentiate between fake news[2] and real news. Given how similar fake news detection is to related problems such as search index spam, ads landing page spam, social networking bots, and porn detection this suggests one of two things: (1) Facebook really sucks at machine learning or (2) Facebook does not want to address the problem. Lets look at each of these:

1. Facebook Sucks At Machine Learning?
Over the course of my career I worked on, amongst other things, Google mobile products (including mobile search index and looking at items like porting Google News to mobile), Google ads targeting to pages across the web, and Twitter Search (I was Director of Search Product for a time). At both Google and Twitter, the companies had to deal with large number of ambiguous signals including:

Ambiguous content on web pages. Google classifies these pages for search results, but also to determine the right set of ads to target to the pages. This included semantic analysis of the pages as well as a look at page "quality" scores.

Fake landing pages for ads. Google needs to make sure ads and the web pages the ads pointed too were legitimate.

Spam tweets. There were (and are) a lot of bots and spam tweets on Twitter. The service was continuously removing poorly ranked tweets from search results.

In all cases, the important thing to do was to understand the content of a tweet, web page, or other content unit, and then to rank the relative quality and importance of that content. Similar problems also exist in areas like Google web index spam and porn detection. In all cases, there are a lot of shades of grey - i.e. there is a fine line between porn and not-porn, or a spammy tweet and a silly or satiric tweet.

Facebook has developed a number of technologies to rank its news feed, to target ads, and to classify its users. However, the claim from Facebook has been that fake news is a complex area, and this complexity makes it difficult to address.

Fake news is not a partisan issue. It is about ensuring that people are helped to understand what is real and what are lies. A lack of willingness to tackle the issue of fake news is a willingness to accept a lack of truth in our society at mass scale.

Other Companies You Can Work At Instead Of Facebook
Great engineers want to work with other great engineers. If Facebook lacks the talent to address the fake news problem, do you really want to join an organization so poor at machine learning? Alternatively, if Facebook simply lacks the will to address this issue, it might be something worth taking into account as well. A number of talented engineers are also immigrants - a group much maligned in fake news posts. If you are a talented machine learning or AI engineer, there are a number of companies you can work at instead of Facebook. Some potential ideas:

Uber. Work on intelligent routing and optimization, self driving cars, and other technologies.

Microsoft. Microsoft is working on applying machine learning to health care problems like cancer.

Tesla. Self-driving cars.

Wish. Large scale commerce platform powered by data analytics and ML.

Stripe. Payment fraud and other areas that power our global payments infrastructure.

Netflix. Media recommendations.

Apple. While less known for machine learning, Apple has been applying it to areas around privacy as well as apps like Siri.

Amazon. Amazon has been doing cool things in voice recognition technology with Alexa/Echo. In addition, I would not be surprised if AWS extended its efforts around GPU clusters as well.

Dozens of AI startups. There are lots of Deep Learning, AI, and ML companies that have been funded recently. There are lots of cool things for you to work on instead.

If you work on machine learning or data science and want to work somewhere other then Facebook - feel free to drop me a line. I am happy to refer you to a few dozen companies as alternatives.

Notes
[1] Exact quote from Zuckerberg is:
"This is an area where I believe we must proceed very carefully though. Identifying the "truth" is complicated. While some hoaxes can be completely debunked, a greater amount of content, including from mainstream sources, often gets the basic idea right but some details wrong or omitted. An even greater volume of stories express an opinion that many will disagree with and flag as incorrect even when factual. I am confident we can find ways for our community to tell us what content is most meaningful, but I believe we must be extremely cautious about becoming arbiters of truth ourselves."

This "grey area" argument is made all the time. Yet machine learning classifiers work incredibly well for porn and other areas that have lots of grey. Similarly, getting rid of the 80% easy to spot, most egregious stuff is a good starting point. This argument strikes me as a red herring.