Data Detractors Are Wrong: The Rise of Algorithms Is a Cause for Hope and Optimism

In the new book, Weapons of Math Destructions: How Big Data Increases Inequality and Threatens Democracy, former Wall Street quant, Cathy O’Neil, describes what she sees as the rise of an algorithm-driven world that is opaque, pervasive, and devastatingly unfair. To make her point, she runs through a litany of progressive issues, such as income inequality, unfair labor practices, failing public schools, and racism in the criminal justice system, to argue that the catalyst, if not the cause, of these problems can be traced back to fundamentally flawed mathematical models. While many will find her theory persuasive, it is ultimately incorrect. Unfortunately, by misdiagnosing the root of these problems, she both distracts from legitimate efforts to address these issues and stalls efforts to use algorithms as part of the solution.

The first problem with O’Neil’s book is that it incorrectly blames data and algorithms for everything from the rising cost of higher education to mass incarceration of minorities. If only it were that simple, these problems would not be so intractable. However, the examples O’Neil describes almost always have a more serious underlying problem. For example, O’Neil discusses how the college rankings in U.S. News and World Report have driven colleges to make changes that increase costs for students without achieving better outcomes. While O’Neill correctly describes the phenomena itself, she fails to dig into the more important question of why the interests of colleges do not align with students. (For a detailed look at this question, as well as what to do about it, see ITIF’s recent report on disrupting higher education.) One danger of this book is that it will draw attention away from the policymakers and advocates who are actually trying to fix these underlying problems.

The second problem with O’Neil’s argument is that it claims that the private sector is unable and unwilling to address problems when they arise in algorithms because companies only care about their bottom line. There are two flaws with that argument. First, in many cases, the economic interests of companies align with their consumers. This is particularly true for commercial applications where one company’s bias is another company’s business opportunity. For example, if certain lenders were to consistently avoid giving loans to minorities who have good credit, then their competitors would likely target these potential customers as a way to maximize their profits. Competition helps keep companies in line. Second, many companies make decisions for non-economic reasons. In a recent NPR interview, O’Neil argued:

“…if you imagine, you know, an engineering firm that decided to build a new hiring process for engineers and they say, OK, it’s based on historical data that we have on what engineers we’ve hired in the past and how they’ve done and whether they’ve been successful, then you might imagine that the algorithm would exclude women, for example. And the algorithm might do the right thing by excluding women if it’s only told just to do what we have done historically. The problem is that when people trust things blindly and when they just apply them blindly, they don’t think about cause and effect. They don’t say, oh, I wonder why this algorithm is excluding women, which would go back to the question of, I wonder why women haven’t been successful at our firm before?”

The problem with this hypothetical is that recent events show the exact opposite to be occurring: many companies are not only self-aware of these types of problems, but they are actively searching for solutions to eliminate their biases. For example, consider that many of the biggest names out of Silicon Valley have made commitments to diversity, including efforts to change the hiring process and remake corporate culture to be more inclusive. And many of their approaches use data and algorithms to fight hidden biases. The reality is that most companies want to do the right thing. When Airbnb found out that hosts on its home-sharing platform were discriminating against would-be guests on the basis of race, it undertook a comprehensive set of reforms, including the launch of a data science team to combat discrimination by testing and refining its algorithms. Rather than attack algorithms as the problem, O’Neil and other activists should be encouraging companies to use more accurate and less-biased ones. Where problems exist, the solution is to use better data and algorithms, not to eliminate them.

The third problem with the book is that it argues algorithms are static and lock in current conditions, compared to humans which evolve over time. For example, O’Neil writes, “If a Big Data college application model had established itself in the early 1960s, we still wouldn’t have many women going to college, because it would have been trained largely on successful men…The University of Alabama’s football team, needless to say, would still be lily white.” This is a perfect example of the way the book overreaches and oversimplifies complex issues. While clearly those who wanted to uphold the status quo during the 1960s would have likely used any technology available to them, it is borderline absurd to suggest that this alone would have prevented second wave feminists or civil rights activists from pursuing their campaigns for gender and racial equality. Moreover, the notion that algorithms are unchanging over time is simply incorrect. Unlike society, where changing opinions on social issues are often measured in decades, code can be tweaked and refined over a period of days or weeks. Consider, for example, that Google changes its search engine 500 to 600 times per year.

The biggest problem with the book is that it relentlessly attacks the straw man argument that everyone believes algorithms are completely objective and unbiased. The book offers little evidence that this belief is widespread, a glaring oversight given how fundamental this assumption is to its premise. Moreover, the book ignores the fact that debates about algorithmic fairness, transparency, and impact are not new. In the 1980s, for example, the SABRE flight reservation system used by travel agents came under intense scrutiny when it became apparent that American Airlines was receiving favorable placement in search results. And these same debates are already playing out in many fields, including education, financial services, and criminal justice.

There would be no controversy (and likely significantly fewer sales) if this book’s message was simply “sometimes algorithms and data reflect the biases in our society or have unintended consequences.” But by taking something many in society already love to hate, and blaming it for many of society’s worst ills, O’Neil has found a guaranteed recipe for this book’s commercial success. The problem is that the takeaway for many people will be that algorithms are fundamentally flawed and to be feared. While she protests a few times in the text that her intent is not to disparage all algorithms, this sounds an awful lot like she’s Donald Trump concluding that “some, I assume, are good algorithms.” Moreover, she ignores the many ways in which data and algorithms are used to combat injustice and inequality (not to mention fight cancer, aid refugees, prevent poaching, stop online harassment, and more) and outrageously suggests that those defending data are crass elitists concerned only about trivial issues like finding better music on Pandora.

Ultimately, by incorrectly blaming algorithms and ignoring their benefits, this book does a disservice to those who it is ostensibly trying to help. A significant threat to disadvantaged individuals is that policymakers will fail to seize the opportunities before it to use data-driven innovation to advance its social and economic goals. While data and algorithms cannot hope to eradicate all of society’s problems singlehandedly, a steady focus on using data science for social good will likely yield many positive results in the years ahead.

Daniel Castro is the director of the Center for Data Innovation and vice president of the Information Technology and Innovation Foundation. Mr. Castro writes and speaks on a variety of issues related to information technology and internet policy, including data, privacy, security, intellectual property, internet governance, e-government, and accessibility for people with disabilities. His work has been quoted and cited in numerous media outlets, including The Washington Post, The Wall Street Journal, NPR, USA Today, Bloomberg News, and Businessweek. In 2013, Mr. Castro was named to FedScoop’s list of “Top 25 most influential people under 40 in government and tech.” In 2015, U.S. Secretary of Commerce Penny Pritzker appointed Mr. Castro to the Commerce Data Advisory Council.
Mr. Castro previously worked as an IT analyst at the Government Accountability Office (GAO) where he audited IT security and management controls at various government agencies. He contributed to GAO reports on the state of information security at a variety of federal agencies, including the Securities and Exchange Commission (SEC) and the Federal Deposit Insurance Corporation (FDIC). In addition, Mr. Castro was a Visiting Scientist at the Software Engineering Institute (SEI) in Pittsburgh, Pennsylvania where he developed virtual training simulations to provide clients with hands-on training of the latest information security tools. He has a B.S. in Foreign Service from Georgetown University and an M.S. in Information Security Technology and Management from Carnegie Mellon University.