In Sickness and in Health: How Big Data Watches over You

All photographs by Jessica Camille Aguirre. All rights restricted.Left: Blood samples are tested at the U.S. Naval research laboratory in Lima, Peru, to determine dengue strains from patients in Iquitos.

THERE ARE FEW PLACES where you would less expect to find answers about the digital age than Iquitos, Peru. The place swarms with intemperate life: fruits sweeten and rot quickly, luminous birds idle on jaunty tropical trees below the promenade and the rainforest presses in around the sweltering city. The novelist Mario Vargas Llosa set his novel about sex-crazed soldiers in Iquitos and called it the “Venice of South America” because of a neighborhood that floats on a tributary of the Amazon River. The city’s natural wealth is equaled only by its impoverished infrastructure. A steady internet connection is hard to come by.

Nevertheless, a project is now underway by a group of scientific researchers affiliated with the U.S. Navy. They’re here to decipher the viral patterns of a deadly disease by tracking residents and applying algorithms to the resulting data. The process is not unlike standard methods of digital surveillance, in which every decision is tracked, recorded, and processed. Ninety percent of the data in existence globally has been generated in the last two years, much of it from people doing everyday things online. Iquitos’s project represents an exiguous sliver of that data, but, by forcing a cool instrument of measure onto its hot messy world, it reveals all that is promising and terrifying about data collection in the 21st century.

Mario Vargas Llosa called Belen, the floating city that stretches into a tributary of the Amazon River, the Venice of South America. Houses, churches and markets are built on stilts or rafts that protect the structures from flooding during the rainy season.

Iquitos

The patient tries to be polite. Luis Antonio Ramirez, 26 years old, slowly props himself up on his hospital bed to shake hands with Doctor Crystyan Siles as I stand to the side. He nods deferentially through a series of questions. Ramirez has been wracked with muscle pain throughout his body for nearly a week, with blood in his vomit and stool, and coming out his nose. He’s feverish — his eyes are dark and his cheeks look distinctly sunburned even though he’s not been outside since Saturday.

On Wednesday Ramirez finally makes it to a bed in the emergency room at the César Garayar García hospital in an Iquitos neighborhood near the center. It’s a humid day in the small city of less than a half million inhabitants, and the hospital windows are open. Ramirez is separated from the patient next to him by thick yellow curtains that hardly sway in the hot breeze. Salsa music plays at the triage station.

Beyond the ER and down an outdoor corridor, a hospital auditorium has been converted into a makeshift ward. Dozens of cots, all occupied, are lined up in three rows. At the head and foot of each cot is a T-shaped piece of wood that holds up a turquoise mosquito net.

The patients here all have dengue, a potentially lethal virus carried by mosquitoes. Dengue can provoke internal bleeding. Although his symptoms are classic, Ramirez hasn’t been officially diagnosed, so he’s not yet encased in a filmy turquoise cube. Siles has asked that the results of Ramirez’s blood sample be prioritized, which means they’ll come back tomorrow morning. If Ramirez has dengue, if he survives the precarious balancing act of illness, and if he lives in a certain part of the city, then researchers will come to his home and ask him to wear a small, white Global Positioning System unit on a lanyard around his neck. They’ll ask his family to wear them too, and some of his neighbors.

“The GPS gives us a ton of data,” lead researcher of the project, Valerie Paz-Soldan, enthuses to me at a café on a sunny morning in Lima, slamming a ballpoint pen down on the paper placemat in front of her to illustrate how she’s tracking the disease. She deposits a spray of blue ink dots that represent the GPS readings of one person as they move across Iquitos. “You wear this thing and you walk around and every two minutes it’s giving a signal to the satellite, so we finish up with a map. All over the city, we have you mapped.”

Dengue is the fastest-growing insect-born disease in the world, and much effort has been spent systematically killing mosquitoes that carry the virus. But between 2008 and 2010, the number of dengue cases reported to the World Health Organization increased by 92 percent. In Iquitos, hundreds of research hours have been spent compiling detailed information on the mosquito population. Paz-Soldan, a social scientist, is as interested in people as in mosquitoes; she is convinced human movement is a key harbinger of where outbreaks will appear.

Attaching a Global Positioning System to a human being may seem like a desperately expensive gambit by a researcher on the frontlines of virus combat. But Paz-Soldan is in fact doing something that represents the vanguard of the global digital frontier. Advances in satellite and telecommunications technology have exploded both the feasibility of and occasion for tracking humans. Cell phone companies around the world routinely gather data on users by triangulating to towers with known locations, or to the time it takes for a phone signal to arrive at multiple reference points. Databases culled from all kinds of digital activity contain troves of information about social behavior. Efforts to make sense of the data are often focused on finding mechanisms that spark contagion. Soldan’s epidemiological experiment isn’t the first to use GPS tracking in pursuit of disease, either. There is a growing sense — among scientists of every bent, engineers and data companies — that myriad problems plaguing human societies for centuries are just a symptom of insufficient data.

In the hospital in Iquitos, I stand next to Doctor Siles as he listens to Ramirez’s ailments: lethargy, a sharp throb behind the eyes, the sensation that his bones are breaking. In the next gurney, an older man with pepper hair and a sallow face is grimacing in pain. He is bare-chested and swathed in a preposterously oversized diaper. The man whimpers and casts an elbow over his eyes, as if to shield himself from observation. I look away.

The Mosquito Hunters: Quantifying a City in Commotion

One of the Iquitos project field workers, Fernando Ruiz Chota, stands pensively in a pink-tiled shower wearing a safari vest and his shoes. He is collecting data for Soldan’s research team by scouring through Iquitos homes. He steps past a pink shag rug on the cement floor in this particular bathroom to get into the shower, and leans over a red plastic barrel, peering intently inside, scratching notes on his clipboard.

Chota is looking for mosquito larvae. When he finds some, he fishes a turkey baster out of his backpack and sucks them up into a small plastic bag. He shines a flashlight down into the three barrels lined up inside the bathroom and then goes out the door that leads directly to a back patio. There are six more barrels of water, and murky puddles have formed in the cracked cement from the morning’s rain. Down a mud slope is a chicken enclosure, with a few shallow blue buckets of water.

“Señora, what is this water for?” he asks the lady of the house, who quietly follows him around.

“It’s for the dog,” she says, motioning to the scraggly little creature that’s kept up a distrustful bark at Chota ever since he entered. “I change it every day.”

“And the water in the shower? You change that every day too?”

She nods, half-heartedly.

Chota carefully notes the location of each water vessel on a pencil-drawn map he’s sketched at the bottom of his sheet. This house, like most in Iquitos, is on a long, narrow plot. Unlike some of the poorer houses, constructed from thin wood planks and a thatched roof, this house is built from cement and brick. The walls only rise up eight feet — above that there is another eight feet of open air below the corrugated metal roof. This arrangement allows wind to circulate in the house, as the proximity of the neighboring houses doesn’t allow for windows. Some homeowners intentionally leave square holes in the ceiling for the same purpose. When it rains, it rains into the house.

For all the water everywhere, there isn’t enough. Homes that are connected to the city’s piping system only have running water for an hour or so a day — the barrels in showers, on staircases, in backyards, and positioned under holes in the roof are there to ensure there’s enough for later, when the pipes run dry.

Iquitos is thus a mosquito-breeding haven. Members of Aedes aegypti, the mosquito species most adapted to carrying the dengue virus, prefer to breed in small standing bodies of water. Chota, a laconic man with a shy smile and an old baseball hat that has seen untold Peruvian afternoons hunting for mosquitoes, is part of an entomology team working with the U.S. Naval Medical Research Center (called NAMRU-6) that has offices in Lima and Iquitos. It’s one of several U.S. military research units around the world whose mission is to study diseases that afflict U.S. troops overseas. (The U.S. Navy Surgeon General, on visit to Iquitos in January, also called the effort a demonstration of diplomatic goodwill.)

There are about a dozen men on the Iquitos entomology team who sweep neighborhoods house-by-house gathering data in pairs. Chota’s partner walks around the house with a handheld sucking machine that captures adult mosquitoes like a vacuum cleaner captures wayward dust. “The Aedes is the strongest of the mosquitoes,” Chota tells me seriously as his counterpart zooms through the house and out into the backyard, sticking his vacuum nozzle into corners and around tree roots. “It’s like a superman.”

Chota takes samples of Aedes aegypti larvae from a rooftop in Iquitos.

When Chota finishes his survey, noting water repositories on his hand-drawn map of the building, he steps out, double-checks that the house code he’s written matches a painted number on the outside wall. This process gets repeated by the entomology team day after day for four hours every weekday morning throughout most of the city. They go to Maynas, they go to Tupac Amaru. They walk through dusty streets lined with papaya trees, down pitted dirt roads, into decrepit courtyards and muddy backyards where bananas grow and young mothers wash laundry in plastic buckets while their children play with the chickens. They draw maps of cement houses and wooden houses where furniture is piled into corners and entire families sleep in one bed, they scour front rooms that double as soda-filled corner stores. They try to quantify a city in commotion, collecting one little species in neighborhoods teeming with life.

A Small Data Problem

Iquitos has been trying to control the mosquito population — and by extension the dengue scourge — for decades. Monitoring the location and quantity of Aedes aegypti can augur potential disease outbreaks.

“The issue is that the problem of dengue is a problem of poverty,” the head of the Peruvian Ministry of Health, César Cabezas Sánchez, explained to me in his Lima office last January. “You can’t separate the health problem from social problems.” The problem of exposed bodies of water, for instance, is inextricably linked to the unreliable state of infrastructure and relentless growth (in the hills surrounding Lima, the speed with which new shanty neighborhoods appear outstrips the government’s capacity, or inclination, to provide services).

Whether dengue cases will appear in a certain area depends on a triumvirate of thorny variables: the conditions that lend themselves to a robust Aedes aegypti population; the presence of an organism within which the virus is already present; and the capacity of public health authorities to predict and control the disease. The logistical complexity of acquiring the information necessary for prediction and control, in the burnished language of technology proponents, is a classical small data problem. And it is exactly the kind of problem the cataclysmic surge of data and its accompanying quasi-scientific tradition is poised to solve.

The nature of this lionized new discipline and the dizzying array of Big Brother analogies that accompany it is the subject of the recent book Big Data: A Revolution That Will Transform How We Live, Work, and Think, written by Viktor Mayer-Schönberger and Kenneth Cukier. Schönberger and Cukier open with a much-cited anecdote about a Google initiative to predict flu outbreaks that, by finding correlations between internet search terms and appearance of sickness, produced flu forecasts that outstripped the Centers for Disease Control and Prevention.

Google has also developed a similar process for global dengue detection. The method bypasses the Gordian knot of disease factors that the Peruvian health minister lamented; it does this by allowing search engine queries that correlate dengue outbreaks to act as proxies for factors observed on the ground. This is an approach lavishly lauded by Cukier — a data editor at TheEconomist — and Schönberger — a professor of internet governance and regulation at the University of Oxford — who identify the investigative credo of big data as swiveling on the hinge of the what in lieu of the why.

In other words, if its two variables are correlated such that one says something about the other, whatever “causes” their relationship can be dismissed. The examples are legion: Walmart discovered that people stock up on Pop-Tarts when a hurricane is coming (so if one is on the way, they move the pastry to the front of the store, along with flashlights); Target sends advertisements to expectant mothers based on buying habits that probably indicate pregnancy (becoming, in one case, the bearer of tidings to parents who were unaware their teenage daughter was expecting).

The point is that modern lifestyle habits produce an immense surge of data and information that can provide insight into the way people behave. In this, the authors draw on a wealth of surprising anecdotes to illustrate how big data methods — or the willingness to consider the relationship between pieces of information expansively — upturned established dictum long before the data age. From seafaring to manhole management, allowing statistical probability to point to occasionally counterintuitive conclusions can strip unexamined assumptions of their underlying ideologies.

Big Data is one of the first attempts to unpack the consequences of data’s unprecedented scope for a broad audience. As such, it is hazily broad, somewhat morally conflicted and tends toward a superficial (if didactic) treatment. Perhaps because of the book’s inauspicious publication only months before Edward Snowden’s revelations about the extent of government surveillance, the treatment of data’s “dark side” seems terse.

Admittedly, one of the most difficult aspects of writing about big data is the inherently unsympathetic position of the main character. What is big data anyway? And who cares? The authors handle the question with earnest admiration spiked with varying degrees of ham-handedness. For instance, they call data, alternately, “the critical ingredient,” “a glacier,” “an iceberg floating in the ocean,” “a magical diamond mine.” That language, coupled with wide-eyed if half-hearted warnings about the power of big data in the hands of the mal-intentioned, certainly rang true in the sunny pre-Snowden days. Even post-Snowden, the problems with writing about data haven’t gone away; they’ve only been reduced to simplistic binaries, such as the classic state-citizen or freedom-security tropes. To be sure, the authors partially foresaw this dystopian outcome. In an article published in the May/June edition of Foreign Affairs, they warn that “big data exacerbates the existing asymmetry of power between the state and the people.”

But big data is also big business, a reality the authors of Big Data champion. Information on consumers is a commodity that can be creatively capitalized. Just as the dengue epidemiologists are convinced that modeling virus transmission could prove key to suppressing a disease, businesses increasingly see behavioral analysis as a means to sell more product. “In light of informational firms like Farecast or Google — where raw facts go in at one end of a digital assembly line and processed information comes out at the other — data is starting to look like a new resource or factor of production,” Cukier and Schönberger write. “Data’s true value is like an iceberg floating in the ocean,” they continue. “Only a tiny part of it is visible at first sight, while much of it is hidden beneath the surface. Innovative companies that understand this can extract that hidden value and reap potentially huge benefits.”

The promise of insight into human behavior has prompted a rash of capital investment. The Harvard Business Review named “data scientist” — someone who demonstrates equal expertise in statistical algorithms and computer software — the sexiest career of the 21st century. Half a million data scientists will be needed in the next five years, according to a report by the McKinsey Global Institute. Spurred by lush job markets, universities across the country are starting to incorporate data sciences into academic curricula; Michael Bloomberg announced New York City’s $15-million contribution to a new data sciences and engineering institute at Columbia University last fall. Transparency Market Research, a global business information company, estimates that Big Data was worth $6.3 billion in 2012. By 2018, it’s expected to be valued at $48.3 billion.

The Logic of Epidemics

A mosquito ovary, when seen through a microscope, most closely resembles a blooming stellar cloud. Billows of hazy gray are webbed with spindly, snowflake-like membranes, pooling against puddles of burnt sienna and — if the mosquito has recently fed — the dark maroon of mammalian blood.

One of the technicians at NAMRU-6’s entomology lab, Rusbel Huiñapi, peers at the filmy constellations through the microscope and calls out to his assistant: “PA-1. NP-2.” He has dissected the mosquitoes Chota collected during his search of Iquitos homes, and characterizes each one according to its reproductive stage.

This is what happens to the mosquitoes after they are trapped by Chota and the entomology team: they are lulled to sleep in the refrigerator for less than five minutes, plucked out of the plastic trap into which they were vacuumed, foisted onto a little rectangular glass microscope slide, and pegged there with a drop of water. Then Huiñapi gets at them.

Before dissection, the mosquitoes look poised for battle. Their bulk seems to gather around their little pearly heads with the tension of a boxer’s shoulders; this is where their appendages emerge like needles. From above, under tiny gleaming black bulb eyes, protrudes the mosquito’s most heinous instrument, and below, behind crystal-paned wings, descends a long oval. This oval contains the mosquito’s reproductive and digestive organs, and it’s what Huiñapi is after.

He takes two pins, one bent and attached to a sliver of wood and one straight, slices off the last section of the abdomen, and pulls. If the ovary doesn’t spill out, he gently presses on the oval abdomen to tease it forward.

The maturity of the mosquito population, as reflected in its capacity for reproduction, is a bellwether for potential dengue outbreaks. Detecting when a mosquito swell could happen is vital for heading off crisis conditions. “When they do the control,” Huiñapi tells me, referring to government campaigns that can include mass larvacide application and neighborhood-wide fumigation sprays, “it makes a big difference. When they don’t look out, it becomes an epidemic.”

In 2010, during one of the largest recent outbreaks, Huiñapi’s 12-year-old daughter was diagnosed with dengue fever. She fell ill at the height of the virus’ spread, when the largest hospital in Iquitos was so overrun that tents were installed in courtyard gardens and cots filled the corridors. Huiñapi’s daughter was taken to a smaller clinic for government employees and their families, where she was hospitalized for a week. Next to her bed was another dengue patient — a young woman who developed hemorrhagic fever, the gravest form of dengue whose telltale bleeding augur deadly descents into shock.

That the hospitals become crowded with dengue patients is a small wonder: there’s nothing doctors can do to treat the disease. “There’s no hierarchy in the treatment of dengue,” Huiñapi observes. “There are no exceptions for anyone.” The dengue dance is one of monitoring, measuring, charting, and hoping. A piece of paper nailed to the wooden cross at the foot of each cot traces platelet counts for patients over time; the lower it falls, the higher the risk for bleeding and ultimately shock. Doctors vie to ward off dehydration on the one hand and internal drowning on the other — liquid leaking out of blood vessels and flooding the lungs and stomach.

But it’s impossible to tell who will develop what. Some people don’t get sick at all — they’re asymptomatic. Of the estimated 1.6 million cases of dengue reported in the Americas in 2010, according to the World Health Organization, 49,000 were severe. Two and a half percent of the 500,000 people hospitalized yearly with dengue die from it. Researchers are still trying to figure out why some people contract the virus and stay perfectly healthy while others are felled by the disease.

“The girl in the next bed was vomiting blood and blood was coming out of her nose, and even I was scared,” Huiñapi recalls of the uncertain hours in the hospital in 2010. “I called the nurse and the nurse came in and looked at her and got scared and went running for the doctor. I took the girl and helped her up so she wouldn’t choke and tried to tell her it was okay and to stay calm, even though I was looking at over at my daughter in the next bed and feeling afraid inside. I told her it was okay.” The girl survived, but 33 people perished from dengue in Iquitos that year.

Fernando Ruiz Chota examines the backyard of a home in Iquitos, checking for buckets of water that could contain Aedes aegypti larvae.

Probable Predictions

With enough data, according to statistical theory, the probability of any given outcome can be known. The NAMRU-6 research crew has tried to apply this logic to predict who will develop dengue hemorrhagic fever. They looked at the platelet counts, interleukin-10 (a protein molecule) levels, and lymphocytes (white blood cells) of incoming dengue patients — these factors reflect the body’s immunological state — and built a model that calculated the odds of developing acute illness based on the initial numbers. The technique they used, called a random forest classifier, averages a series of decision trees: branchlike graphs of probabilistic outcomes. It’s commonly applied algorithmically to large data sets to say what will happen, based on what has happened.

One way to think about an algorithm is as a mathematical formula that, instead of describing the universal relationship between things, defines that relationship iteratively — over and over again based on the algorithm’s internal rules. So for each new piece of information, the algorithm already has a category into which the data belongs. If a formula is a craftsman making a distinct pair of shoes, an algorithm is an industrial factory apparatus banging out a hundred identical shoes per minute. For that reason, algorithms are perfect for handling data.

In the milieu of digital data, the algorithm is king. In particular, a class called “machine learning” algorithms run much of the web. The decision tree model that the dengue researchers used is a classic machine-learning algorithm. The machine here is the algorithm and it learns from data — either to provide a digital service, like an online translation that learns from all available translated texts, or to describe something like a viral outbreak. An oft-cited example of such learning is email spam: how to tell if an email message is unwanted by the intended recipient?

To do that, a data scientist will write algorithmic instructions. If the word “Viagra” appears more than 20 times in one email, say, or if there’s a suspiciously generous use of exclamation points, that’s spam. The algorithm will get tested on a small hunk of data (a training set, it’s called, as though the algorithm were learning to ride a bike), and then it will get scrutinized to see if it did a good job. Junk emails should be sequestered and good ones will not be waylaid.

Every piece of data an algorithm takes into account makes it better at deciding which mail to deliver; the more data there is, the more material there is for an algorithm to learn on. (Algorithmic pattern recognition is also the basis of how artificial intelligence works.) By compiling data about individual online activity, an algorithm can customize information to historical trends — thus advertisements in email accounts that pertain to subjects discussed in online conversations, book recommendations based on past purchases, and romantic matchmaking based on questionnaires. But the power of machines trained on data is not just to tailor a personalized online experience.

“In the future,” write Cukier and Schönberger, “and sooner than we may think — many aspects of our world will be augmented or replaced by computer systems that today are the sole purview of human judgment.” To borrow a tired phrase, the future is now; the question is only how deeply the value of human judgment has been undercut. Cukier and Schönberger provide the classic case of Amazon, whose initial editorial staff was rendered obsolete when Amazon discovered their algorithmic book recommendation system led to more sales. Sidestepping the whole messy fuss of decisions hammered out by humans is a prospect that appeals to more than just the business sector. Cass Sunstein, President Obama’s former “regulatory czar,” argues for supplanting human intuition with this kind of statistical process in crafting policy: “Making choices about rules without relying too heavily on intuition, anecdotes, dogmas, and impressions.”

Data sciences are already being levied in law enforcement — police departments from Memphis to Los Angeles now contain predictive analytics units to forecast crime clusters based on accumulated data. Two police workers at one of the first departments to use analytics, in Richmond, Virginia, wrote in Police Chief Magazine in 2003 that “data mining can also be used to analyze and model violent crime. Behavior, even extremely violent or seemingly unusual criminal behavior, frequently can be modeled, anticipated, and even predicted.”

Cukier and Schönberger call this latter inclination — to apply predictive analytics broadly — the potential “dark side” of data, warning against a dictatorship of statistical probabilities. In other words, just because it is statistically probable that something might happen doesn’t mean it will.

Nevertheless, these tactics are still often cast in disease epidemiological terms. Predictive policing usually entails sending additional units to an area expected to have a crime surge. Gary Slutkin, an epidemiologist at the University of Illinois, conceived of a model based on disease transmission dynamics to identify people likely to be involved in gang violence in Chicago and intervene by sending “violence interrupters.” “The interrupters’ role,” Slutkin told a PBS filmmaking crew in 2011, “like the TB disease workers’ role, is to do this initial interruption of transmission.”

The model, called Ceasefire, approaches violence as an epidemic that can be quelled.

The Telltale Linchpin

Mapping sickness is disease epidemiology’s bread and butter. John Snow famously mapped human deaths in London during the cholera epidemic of 1854, debunking a commonly held theory that the disease was transmitted via bad air and laying the foundation for modern epidemiological study. Walking the neighborhood with a local minister, Snow inked cholera deaths on a piece of paper and traced the source of disease to a contaminated water pump on Broad Street. But for NAMRU-6, the challenge of isolating a telltale sign of dengue’s spread has unfolded on a much greater scale than Snow’s quest did. Snow mapped hundreds of cholera deaths; the Iquitos team has millions of data points to contend with. In order to levy statistical analysis on the sea of dengue data, researchers had to turn to technology.

Amy Morrison, the lead researcher of the entire Iquitos team, which includes Valerie Paz-Soldan, started mapping dengue in Puerto Rico during the era before Mapquest and Google Earth. To pinpoint cases of illness geographically, Morrison drove around the island with an antenna attached to the top of her car getting readings to make her own digital map. But when she plotted the cases, they didn’t make sense. If Morrison expected to find dengue’s Broad Street water pump, she was sorely disappointed. In Puerto Rico, there was no discernible order — no telltale linchpin.

“We weren’t seeing this diffusion effect that a lot of people expected,” Morrison tells me in her Moorpark, California, childhood home. Morrison is an entomologist, plain-spoken and indefatigable with an unmistaken Americanness about her despite having lived outside of the country for more than a dozen years. “People had sort of thought, well, if mosquitoes are playing a major role, you’d see a cluster of cases. So you’d have a house, and then the neighboring houses would have it — there would be some kind of spatial structure. For example if you have a house with cases, the neighbors would have a higher likelihood of being infected than somebody across town. And that’s not what we found. At all.”

Unlike Snow’s emblematic epidemiological map, the CDC’s case data didn’t point to an obvious pattern. In the absence of exhaustive studies, the prevailing paradigm for understanding dengue has relied for decades on blaming the mosquito. As a result, public health departments all over the world took it as fait accompli that even the smallest reduction of mosquitoes will limit the spread of disease. Kill a certain amount of mosquitoes, the thinking went, and disease will dwindle correspondingly. Massive global campaigns were launched throughout the 1970s and 1980s to wipe out mosquitoes with a barrage of chemical fumigation. That approach, which Morrison says has been “miserably unsuccessful,” suffered for a long time from a lack of demonstrability.

So when Morrison arrived in Iquitos, she took it as her main task to quantify the relationship between the density of a mosquito population and the transmission patterns of the dengue virus. She wanted to know whether even a slightly smaller mosquito population would mean less disease.

First she had to make maps. The Peruvian military had compiled a collection of simple line maps made from aerial photographs, and Morrison began with those — correcting the parts where the lines didn’t match, printing them and sending them out with teams of local men who would tromp through the city measuring each lot and painting a code on whatever structure existed there. She drew the lots into a GIS program and numbered them, and from then on, all the data the Iquitos research teams has gathered — Chota’s surveys and water container indices, Huiñapi’s findings on the status of mosquito ovaries — has been linked to a geographic code that corresponds to a lot on the map.

Hundreds of research papers have been published based on this data. Post-doctoral fellows have come to Iquitos and gone. Scientists at UC Davis, Emory University, Tulane University, the University of New Mexico, and North Carolina State University, among others, have analyzed the data. They published findings on mosquito breeding habits and the relative efficacy of eradication techniques, on sampling methodologies and serotype spread (the dengue virus comes in four permutations).

What emerged is an array of seemingly quixotic prescriptions. The threshold of mosquito population that would keep the virus stifled, for instance, is unattainably low. “It’s so low that there’s not really a distinction, operationally, between, like, complete eradication of the mosquito,” Morrison says.

When researchers did look for a spatial pattern in disease occurrence, they searched for some sort of cluster on the map that would point to a catalyzing effect. But geography yielded nothing, as Morrison had found in Puerto Rico. Case data also belied the geography of insects: a mosquito’s range of movement is much more limited than the reach of the disease. Something must be carrying the virus around a much broader area than a mosquito can fly. Paz-Soldan’s GPS study set out to investigate.

Even before Soldan’s GPS study was finished, as the data was flooding in, Morrison realized that a pattern was taking shape. When the Iquitos research team picked up a case of dengue during Soldan’s initial pilot study on the significance of human movement, they would ask the sick person for every location they’d been to over the past two weeks. Then the team would go to each one of those places and ask residents for blood samples to determine whether they were carrying the virus. And when those visits were plotted, a map did, finally, reveal an invisible web.

Mosquitoes matter, it turns out, but they are only half the story. What’s just as important — what eluded scientists until Valerie Paz-Soldan proposed her GPS tracking studies — is the role played by the human carriers of the dengue virus. The researchers discovered that a significant portion of transmission to human beings came while visiting each other at home; they recently published these findings in the journal Proceedings of the National Academy of Sciences. Unlike other infectious diseases passed directly from person to person, like HIV, the intermediary role of the insect had deflected researchers away from pursuing human transmission theories. Instead, they had focused, for decades, on the mosquito.

“So here is the index house,” explains Morrison, pointing to a little cartoon house in a presentation on her computer that shows the GPS research findings. “And these are the three houses that [the index case] reported visiting.” Three more houses appear on her screen. “One is across the street, one is down the street, and one’s 400 meters away — four or five city blocks away.”

“There were essentially six people tested in that house,” Morrison points to the index house, the one where the first sick person was found — the one that prompted researchers to go ask all the family members where they had been. “And three of the six people had evidence of infection.

“Okay, now for each of the contact houses,” she continues, pointing to the little cartoon houses they visited.

“So this house, of four people tested, one was infected.

“In this house, two of five.

“And in this house, three of six.

“That’s all within 15 days of that initial case.”

The team conducted the same experiment with houses where no one was sick. In the houses the healthy people visited, there were few, if any, recent dengue cases.

The clusters of infection that the team had been looking for did exist. And as with all infectious diseases, proximity was the determining factor of contagion. But with dengue, as Morrison says, “it’s not geography. It’s not geographic distance. It’s social distance.” Clusters of dengue weren’t found around a specific house, but among the places sick people visited. Disease clusters may be separated by city blocks, but they are concentrated among social networks.

Now, the researchers are using GPS data to verify peoples’ recollections of where they go. They’re also trying to make their data prescriptive by reverting back to geography: if there is enough overlap in the patterns of people’s movement — in Soldan’s thwacked dot clouds — then mosquito control could be concentrated there.

“In HIV, we’ve identified locations where there are a lot of sexual networks occurring and you intervene in those locations,” Soldan explains. “With dengue we haven’t done that yet, although it may be possible that we can identify residential locations that have a lot of people, so we intervene in those locations.” The proposal is akin to equipping government intervention teams with a scalpel and point of surgical entry instead of relying on a hatchet and a prayer: to target mosquitoes in areas of rampant transmission instead of blasting entire districts with chemical fumigation.

The contact cluster research had one unforeseen side effect. While only a fraction of infected people develop symptoms, asymptomatic people can still carry the virus that goes on to spread the disease further. When the research teams went out to take blood samples in contact houses, they often found cases of infection among people who felt fine. In effect, they had brought invisible carriers of disease into view.

Virality and “Social Distance”

By engaging in digital life, most people produce data and become unwitting subjects in social research. At the fourth annual conference on social network analysis in the summer of last year, researchers presented more than 200 papers trying to derive models for human behavior based on data culled primarily from social networking sites.

Cell phones are tracked to supply what the industry calls “location-based services” — providing restaurant search results only within a 10-block radius, or making it easy to stamp a digital photograph with its geographic mark, à la Instagram. Location records usually go to cell phone companies — with the exception of Apple’s erstwhile data policy, uncovered in 2011, that allowed for the storage of tracking information inside each iPhone. If a smart phone owner downloads and uses applications, their location may become publicly accessible. Instagram, for example, publishes an API — an application programming interface — that allows anyone with some programming knowledge to see who’s where when.

Data produced by social media also gives away information on someone’s location, as well as a plethora of additional insight into what Morrison’s dengue study described as social distance. One of the papers presented at the conference last August took publicly available data from a check-in application called Gowalla (which allowed people to share their whereabouts publicly, and was later acquired by Facebook) and modeled people’s daily trajectories in the context of their personal relationships.

The researchers came away with multiple findings, one of which was that “human movements frequently contain repeated patterns and friendship is bounded by distance.” They were building a “friendship-based mobility model” of how people move in groups, to be used in urban planning and economics forecasts. But knowing who is friends with whom provides researchers with more information than physical location. Deciphering the meaning of human relationships within the digital world is key to building an epidemiology of information.

Data scientists want to know, foremost, what makes something “go viral” — what makes for an epidemic of contagious information (researchers call the pivotal points of widespread contagion “cascades”). Duncan Watts, a researcher at Microsoft, has looked at the significance of social influence with an eye toward examining Malcolm Gladwell’s Law of the Few — the idea that thoughts and behaviors spread through populations like a virus propelled by a few pivotal people. He found that the critical propellant of dissemination was not — as Gladwell had posited — a few highly influential people, but rather a large mass of people who were easily influenced. “When this critical mass existed,” Watts has written, “even an average individual was capable of triggering a large cascade — just as any spark will suffice to trigger a large forest fire […] Conversely, when the critical mass did not exist, not even the most influential individual could trigger any more than a small cascade.”

The data Watts was working with was culled from Twitter, whose ephemeral and pithy snippets of information only capture a sliver of communication, and arguably does a poor job simulating relationships. Social networks like Facebook contain far deeper information about behavior and, in turn, influence.

Facebook doubled its team of data scientists last year, and they have begun conducting experiments on subsets of users. The MIT Technology Review reported last June that the data scientists were selectively hiding shared links to see which factors determined information catalysis. Zuckerberg used this insight when he decided to boost organ donation using Facebook: “Users were given an opportunity to click on a box on their Timeline pages to signal that they were registered donors, which triggered a notification to their friends. The new feature started a cascade of social pressure, and organ donor enrollment increased by a factor of 23 across 44 states.”

The jump from understanding how information spreads influence through a social network to shaping human behavior is a subtle but definitive one. By building models of information transmission, digital companies hope they can capitalize on knowing what makes something infectious. Data enthusiasts have cultivated lofty claims about the unprecedented potential for determining the future by influencing behavior — a prospect that Big Data doesn’t fully examine. At Stanford’s Persuasive Technology Lab, the aim of understanding “how computing products — from websites to mobile phone software — can be designed to change what people believe and what they do,” is purportedly intended to, among other things, achieve world peace.

The Young Doctor: Cultivating Trust

On the ground in Iquitos, Doctor Crystyan Siles shouts into his Movistar Blackberry, scanning the dusty asphalt road as horns and screeching tires blow past. He darts out between moto-taxis and lopes down a ravine-carved dirt slope toward a body of brown water.

Dr. Crystyan Siles checks his phone on his way to visit a patient, while his canoe driver fuels up at a floating gas station.

“Yes, hi. I need the results from that patient they’re bringing you as soon as possible,” he says of the case he just saw in the hospital emergency room. With his cell phone still pressed to his ear, Siles balances on a narrow creaking wood plank over the water into a long wooden canoe that will transport him through a neighborhood on stilts that stretches out over this section of the Itaya River hugging the Eastern edge of Iquitos. Houses in this part of Iquitos, called Belén, either hover above the high water mark — nearly 10 feet above the river’s surface in January — or float directly on the water, and the narrows left between them can only be traversed by boat.

Siles is young (he started medical school in Cuba when he was 17), tan from scurrying between far-flung health posts and in constant possession of his phone. This, along with a large silver watch he constantly shakes down his wrist in an absent-minded tick, keeps him iteratively updating, checking to make sure he is connected and on time.

The patient he is going to see, out beyond the main thoroughfare where a church juts up on its stilts and hand-painted signs promise commerce (“Repair shop inside”; “There is food here”) is a shy 14-year-old girl diagnosed with Mayaro, a rare tropical disease that’s still not well understood. She is part of a longitudinal study conducted by NAMRU-6 to understand people’s immune system response.

From there, Siles will go back to the hospital to check for dengue cases again before returning to the Naval lab. Tomorrow he’ll make his weekly trip to another health post in a town called Zanguracocha, half an hour outside Iquitos down a long, pitted dirt road. In Zanguracocha, he will write prescriptions for babies who have the flu, counsel mothers to move their children away from indoor cooking smoke, and listen to a man’s story about why his left thigh was bitten by his own dog. Siles is the only doctor who goes to the tiny village, and patients line up hours in advance to see him.

But his essential task, the reason NAMRU-6 has him on the team and he undertakes canoe journeys for check-ups, is to keep track of people like the Mayaro patient. He was hired by the lab because of his ability to make sure study subjects don’t get lost midway through.

The Right to Be Forgotten

As for Valerie Paz-Soldan, the researcher who envisioned the GPS studies in Iquitos, when she began approaching her potential research subjects about wearing the little machines, they had concerns. They were worried their neighbors and husbands and strangers would think the small white unit was the latest cell phone technology from Japan — they worried people would try to mug them for the device, or that husbands would decide it was a gift from a new lover and become jealous. (Men did not express the same concern; they worried it would record them with their mistresses.)

Researchers for the study assured their participants that information on their whereabouts would be entered in a secure database and made anonymous before publication. No one, including the research subject, would be able to request the log tracing their path through the world.

The Institutional Review Boards that appraised the study design — and there were many, from the Peruvian national boards to U.S. Naval and academic ones — initially wrung their hands over the implications for personal privacy. But people in Iquitos were familiar with the scientists’ work. Since the 1950s, there have been teams sweeping Iquitos, testing the latest mosquito trap, planting chemicals in backyards to kill larvae and spraying entire neighbors with chemicals to stymie the potential for an outbreak. If the scientists could save lives by recording daily habits, people reasoned, they wanted to cooperate.

In the United States there is no comprehensive law legislating individual online data privacy. Seven months after Snowden released documents that traced the inner working of the National Security Agency, President Barack Obama announced in late January that the massive surveillance machinery would be curtailed. The policy changes he promised, criticized immediately for failing to detail the methods of implementation, left much of the surveillance infrastructure intact. Phone records, still held by individual cellular service companies and the NSA itself, would require judicial approval before examination by the agency. But databases with troves of information on people’s behavior would continue to exist. The impulse to restrict government surveillance, catalyzed by Snowden’s document release, has found only paltry expression regarding non-governmental data collection. In 2012, Congress launched an investigation into nine companies that compile and sell individual profiles. In December that year, the Federal Trade Commission followed suit. These companies, known as data brokers, sell personal information and collaborate with social networking sites. “Users might find themselves seeing advertisements that are based on actions they took in the real world,” the Electronic Frontier Foundation reported in April, “as well as personal facts about their life and circumstances that they have been careful not to put on Facebook.” There are still no national comprehensive privacy laws regarding companies’ use of data.

In contrast, European legislators have taken a much more aggressive approach, trying to ensure that individuals can levy some control over the wisps and trails they leave through the physical and digital world. An expansion on data regulation is currently under discussion that would require companies — both EU companies and entities outside the union collecting data on EU residents — to get consent from users before stockpiling data on them. The proposal includes a clause that would allow people to erase digital data about themselves. It’s called “the right to be forgotten.”

Escaping the weight of history via erasure requires, in the first place, knowing what exists. But the logic of Big Data allows information that was never divulged to become known — just like NAMRU-6 found asymptomatic carriers of the dengue virus through clustering. The power of strong models, trained on data, means more information can be inferred from fewer and fewer points. This is true for social networks and patterns of physical movement. Just four distinct location data points in a set of 1.5 million, Nature reported in March, could identify an individual. Researchers at Cambridge showed, also in March, that expressions of personal preference on Facebook could be analyzed to reveal details like a person’s propensity for substance abuse.

Data scientists call this “the curse of dimensionality.” If each attribute of a data point is a dimension, more dimensions put each data point further and further away from other data points. If each data point is a person, that means it becomes easier and easier to isolate a single individual, even in a large dataset. There’s no hiding, even in a crowd of millions.

Lost in the Rainforest

I accompany Doctor Siles on his weekly trip to the Zanguracocha health outpost on a scorching morning. His pace, rapid-fire and punctuated by his chiming Blackberry, contrasts with the languid village and the silent roar of the rainforest.

Siles was hired by NAMRU-6 to keep track of study subjects based on his experience with another research project on malaria led by New York University. The malaria project was focused just on the sprawling little town and its surrounding communities. “I know everyone here,” Siles says, bustling through the one-story health post. This is not necessarily a good thing; Siles still gets nervous when babies squall. “When the kids come in and see me, they start crying because they associate me with the malaria blood samples,” he says, and pauses. “Even though I never took the samples!”

A series of wall maps studded with flagged pins still hang in the outpost; the remnants of Siles’s malaria work. Each pin represented a malaria case — some bristling together in a thick cluster, others farther off showing an isolated house. “Once, when we were seeing patients, one little boy pulled at one of the maps and all the pins came flying out, brrrat dat dat dat dat,” Siles says with a laugh. But the doctor never had to take a pin out on purpose because he couldn’t find someone for a follow-up.

With the NAMRU-6 studies, it’s been different. Iquitos is a mercurial city, and many of its inhabitants are migratory. Despite Siles’s efforts, he loses research subjects all the time. He’ll take a boat to their house or head out of the city in the project’s Land Cruiser and find that the person has simply left. Cell phone numbers stop working. Neighbors shrug. People go where there’s work and it’s not uncommon to upend in search of opportunity.

Often, that means going out into the rainforest — back to a family farming plot or following some large-scale extractive project — beyond the reach of telecommunications. Families move up and down the river, or migrate to one of the larger cities on the coast. Wherever they go, they are lost to the study. They’ve fallen off the map.