How big data and algorithms are slashing the cost of fixing Flint’s water crisis

Authors

Assistant Professor of Marketing, Ross School of Business, University of Michigan

Disclosure statement

Jacob Abernethy receives funding from the National Science Foundation and Google.org.

Eric Schwartz does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

The water crisis in Flint, Michigan highlights a number of serious problems: a public health outbreak, inadequate urban infrastructure, environmental injustice and political failures. But when it comes to recovery, the central challenge, and one that has received relatively little attention, is our lack of useful information and understanding.

Who is most at risk? Where are the harmful sources of lead? Where should resources be allocated? Using modern big-data tools, we can answer these questions and help inform the response to this crisis.

With the support of our student team at the University of Michigan, we have aggregated a trove of available data around Flint’s water issues, including water test results, records of the service lines that deliver water to homes, information on parcels of land and water usage. Leveraging new algorithmic and statistical tools, we are able to produce a significantly more complete picture of the risks and challenges in Flint.

These methods strongly resemble those used by Facebook, Amazon and other large tech companies who collect vast amounts of data from users. But whereas Facbeook’s algorithms crunch through uploaded photographs to detect faces and Amazon’s models predict which products you’ll like, we are using these analytics tools to detect homes with high risk of lead contamination and to predict the locations of lead pipes buried underground or hidden in the homes of residents.

What have we learned? Here are a few takeaways from our research.

Lead contamination varies widely across homes and is highly scattered around Flint, but it is surprisingly predictable

The headlines on Flint could easily lead one to believe all homes in the city have dangerously high levels of lead. But in fact, using data from the state’s sentinel program, we found during a period in February only between 8 and 15 percent of homes had lead above the federal action level of 15 parts per billion (ppb).

Indeed, things have been improving from January through August 2016, according to the test data from the sentinel program. Based on about 750 homes monitored repeatedly, fewer homes have tested above the action level over time. Almost half of all samples have virtually no detectable level (below 1 parts per billion).

Percent of samples in the DEQ’s sentinel program that tested below the federal action level. Credit: Jonathan Stroud, Ph.D. student at UM.

These low numbers provide little comfort when we don’t know which homes are at risk. Only around 30 percent of homes in Flint have had their water tested, according to government data, and these water tests do not guarantee safety; they only identify danger. Also, it is clear from the data that homes that are slower to sample their water tend to be those at much greater risk.

So can we find these homes? The answer is yes, to a modest degree of accuracy. We have built statistical models that profile a home based on several attributes (year of construction, location, value, size, etc.), and provide an estimate of the risk level.

Based on our statistical models, we can display locations which we estimate to be at high risk of lead contamination. Credit: PhD students Guangsha Shi, Jared Webb, and others at UM.

The quality of these models is driven by the huge swaths of data from water samples submitted by residents and tested by government officials in response to the crisis. This provides us with a database of measurements that includes over 20,000 water samples covering roughly 10,000 homes in Flint since November 2015 to present. We have made our risk assessments available to government officials, and are being incorporated into an mobile application, funded by Google and built by students at UM Flint, that allows Flint residents to learn of their home’s risk level.

Younger properties have lower lead levels, on average and based on the 90th percentile (blue line). There were 8 percent of tests above federal action level 15 ppb (dotted red), and still some well above 150 ppb and even 1000 ppb. The highest 0.5 percent of samples are not shown.

These statistical models not only provide predictions; they also give a better understanding of the problems. This has much broader implications, as these factors predicting lead may generalize beyond Flint.

The data suggest that lead contamination is associated with a number of factors; older homes tend to be at greater risk, for instance, as are those of lower home value. Lower-value homes also tend to be those with the lowest rates of water sampling. Additionally, while the highest readings are geographically scattered, the homes predicted to be at high risk tend to cluster in specific neighborhoods.

Flint’s lead pipe records are spotty and noisy, but statistical methods can significantly fill the gap

Media reports and political efforts have continued to focus on the so-called “water service lines” that connect each house to the distribution system in the street. The assumption is that homes with lead service lines are most at risk for lead exposure and poisoning. As a result, much of the attention has been on locating and replacing these lines.

The problem, however, is not only with lines made out of lead material: Lead particulate can accumulate on the walls of corroded galvanized steel pipes. Pipes made of copper or plastic, on the other hand, are generally considered to be safe.

But there are immediate challenges with the line replacement program. And the most obvious is: Where are these dangerous pipes?

The city, unfortunately, did not maintain consistent records on service line installations and materials. But city officials eventually found, after some searching, a set of maps with handwritten annotations (last updated in 1984), and these records were digitized by a UM Flint research team lead by Professor Marty Kaufman. These appeared to identify the material of the service lines for most home parcels in Flint.

Using paper records, researchers were able to get a rough idea of what type of material – lead, copper or plastic – was used to bring water service to home.Author provided

How complete and accurate are these records? Unfortunately, not very. For over 30 percent of homes, either there are missing labels or the records disagree with a home inspection of a portion of the service line.

We can again fill in gaps with the help of algorithms and data. Looking for patterns in the existing records, statistical tools can provide a reasonable “educated guess” as to the type of material in a home’s service line. We have been working directly with Gen. Michael McDaniel’s line replacement team, providing statistical estimates of where lead pipes are most likely to be found, and this has guided their targeting of replacement resources.

Our recommendations are adapting to incoming data, using techniques applied in online advertising experiments or clinical trials, to identify the risky homes quickly and efficiently.

Professors Schwartz (left) and Abernethy (right) at a service line replacement site in Flint, Michigan.

Our machine learning techniques, which utilize all of the available city data, parcel records and a database of over 3,000 inspection reports, are able to estimate line materials with better than 80 percent accuracy. We find, for instance, that houses built in the 1920s to 1940s are many times more likely than those built after 1960 to have lead in their service line. Our guesses aren’t perfect by any means, but estimates of this level can save millions of dollars on recovery efforts.

Home service lines may not be the largest contributor of lead

Despite the huge media attention focused on the service lines, one of the major takeaways from our analyses is that these service lines may not be the major driver of the lead in Flint’s drinking water. Yes, it is the case that those homes with copper service lines have lower lead levels, on average, than those with lead in their service line. But when you look closely at the water testing data, the differences are much smaller than you might think.

While it is difficult to determine with certainty due to the spotty records, what we have found is that large spikes of lead occur in homes with and without lead service lines. This suggests a large fraction of the dangerously high lead readings are probably not being driven by the service line material but instead by other factors. Environmental engineers who study these problems report that lead can leach from several sources, including the home’s interior plumbing, faucet fixtures and aging pipe solder.

We can look at homes that, based on records and home inspections, appear to have copper-only service lines versus those containing some lead. We plot the distribution of the lead readings for water samples from these two home categories.

What we can conclude is that citizens as well as policymakers may need to widen their focus beyond the service line materials and consider alternative efforts to address other sources of lead. Service line replacement is certainly a necessary part of the solution, but it will not be sufficient.

Toward solving the broader problem, data and statistical tools can help greatly reduce risks at much lower cost, and a data-oriented understanding of the problems in Flint can guide efforts to address lead concerns in other regions as well.