Statistics, Fake News, and AI: Who’s on First?

1 March 20192,639 viewsOne Comment

Karen Kafadar

The title of the book in the hands of the fellow seated next to me on the plane was “Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Are,” by Seth Stephens-Davidowitz. The book is actually a good illustration of why we need sound statistical design, surveys, and analysis to avoid the pitfalls that arise when we rely on our own intuition about “data” we find on the internet—self-selected and highly biased pieces of “information.” Indeed, the famous headline, “Dewey Defeats Truman” (1948)—familiar to students in our sample survey classes—reminds us sample survey design is important and erroneous information is not new to journalism (or, for that matter, any discipline).

What seems to be new is the introduction of intent. While errors in reporting were previously unintentional (e.g., the result of bias in surveys, errors in typesetting, misunderstanding the information, or recording incorrect information from the source), the “disinformation” that is particularly insidious seems to be generated with the intent to persuade to a particular viewpoint that cannot be justified by, or (more maliciously) is contradicted by, the facts.

Much of statistical learning (cf. The Elements of Statistical Learning by T. Hastie, R. Tibshirani, and J. Friedman) focuses on classifiers, starting with Fisher’s discriminant analysis and moving on to logistic regression, neural networks, and more advanced multi-layer neural networks (i.e., “deep learning”). So, the mathematical properties of classification techniques have been well studied over the years. In this time of “fake news,” many important questions emerge: How can statisticians contribute to the reduction of disinformation caused by malintents? What are the research challenges in applying these techniques to massively streaming online news stories, tweets, postings, and social media outlets? Can those methods meet the demands of streaming data? Do the mathematical properties still hold under those circumstances? Do we need to develop more sophisticated approaches to meet these challenges?

At least as important as the research challenges associated with identifying “fake news” are the challenges in convincing readers the news is disingenuous. That turns out to be an even harder problem, one that requires not only statistical methods but also understanding of human psychology and media communications. What aspects of a so-called “news story” lead a person to be convinced by it? What traits lead some people to receive “fake news,” propagate it (e.g., via the internet or Twitter), and believe versus disbelieve it? How do we design studies to identify these aspects and traits, and what studies can we conduct to see how effectively we can change the “believers” of fake news stories into “skeptics” or “disbelievers”? What research has been done in this area, what further research should be conducted, and how shall we establish the collaborations to conduct it?

Both sets of challenges are being tackled by the second of this year’s presidential initiatives. I am delighted that co-chairs Jessica Utts of the University of California, Irvine and Jun Yang of Duke University have agreed to lead the “Disinformation Initiative.” Jessica is well known to our community as the ASA’s 2016 president. During her presidency, her initiatives focused on communicating statistical concepts to broad audiences. Jun Yang, associate chair of the department of computer science at Duke, is well known in the computer science community for research in computational fact-checking. Their team consists of statisticians (Tim Hesterberg at Google, Trevor Hastie at Stanford, John Bailer at Miami University, and Regina Nuzzo at Gallaudet), computer scientists (Huan Liu at Arizona State), and a media specialist (Trevor Butterworth). Together, we hope they will develop the following:

A research agenda and plan for encouraging statisticians and data scientists to engage in research and collaborate in this area (both in the technical algorithms in misinformation and disinformation and in the design of studies to identify traits that lead people to be influenced by fake news)

A plan for creating mechanisms (public information campaign or venues for dissemination) that will help the public understand, and be less influenced by, fake news.

As with so many of the successes in our field, this one requires close collaboration with domain experts. I feel confident this task force will demonstrate much is to be gained in advancing this multidisciplinary research by collaboration.

The field of artificial intelligence (AI) seems to exemplify the need for collaboration. While many have seen the proclaimed successes of AI as largely oversold (see last month’s President’s Corner), AI has more recently been identified—by both government agencies and industry—as a core component in which billions of dollars are being invested.

Statistics has a major role to play in AI research, which heretofore has been dominated largely by computer scientists and engineers. Indeed, the National Science Foundation (NSF) has recognized AI as a “highly interdisciplinary endeavor, which has included many fields such as computer science and engineering, cognitive science, philosophy, mathematics, economics, psychology, linguistics, and ethics.” Where is “statistics” in this list? How can the ASA persuade the NSF “statistics” should be in this list?

There is some hope in this regard, if we choose to take advantage of it. My colleagues report several presenters at the most recent meeting of the Association for the Advancement of Artificial Intelligence called for collaboration with statisticians. We need to step up to the plate before we find ourselves behind the curve, as has happened already with data science and machine learning.

We recognize our training in the mathematical sciences, the importance of experimental design, and the development of inferential procedures and expression of our confidence (and uncertainty) in those inferences is essential for AI. This training positions us well to respond to the challenge. Now we need to take the lead in collaborating on the research that will lead to advances in both AI and statistics.

The “morals” identified in last month’s President’s Corner apply here to the challenges in identifying fake news (and converting “believers” into “skeptics”) and in artificial intelligence. These two fields allow statisticians the opportunity to do the following:

In many areas of science, the failure to replicate is over 80%. Virtually all science-based papers have some sort of statistical stamp of approval. There are likely systemic problems and likely that some authors are knowingly gaming the system. Anyone that can, should help with this project.

Welcome!

Amstat News is the monthly membership magazine of the American Statistical Association, bringing you news and notices of the ASA, its chapters, its sections, and its members. Other departments in the magazine include announcements and news of upcoming meetings, continuing education courses, and statistics awards.