A human genome, remixed, and gone viral. In a nutshell, that’s the conditionally re-published genome of HeLa cells, which flourish in labs worldwide, but trace back seven decades to the unethically biopsied cervical tumor of American Henrietta Lacks.

Notably, HeLa’s genome offers vital glimpses of the woman whose life it stole. But as a posthumous portrait of Lacks, it’s at best a cubist one — her genomic likeness reduced to shards jumbled, mirrored, and spattered by decades of mutation. In time, though, it will prove to recognizably portray her, first in generically broad strokes, but eventually in more intimate ways that only she and her loved ones had known.

And that prospect sparked rancor in March, when the authors who first published (and then withdrew, amid the hubbub) a HeLa genome patly denied that their data shed light on Lacks’s own genome, much less on who she was:

[W]e cannot infer anything about Henrietta Lacks’ genome, or of her descendants, from the data generated in this study [because] it is impossible to distinguish which parts of the genome sequenced here originate from Mrs. Lacks, her tumour, or laboratory adaptation.’

By this, they hoped to justify publishing a genome derived from one named person, without personal or family consent. That claim anticipated lively new discussion of genomic privacy — but it was false, roughly like the finder of a rain-soaked wallet conveniently declaring the banknotes legible, but the IDs too wet to read.

Their own paper, after all, noted that HeLa shares nearly every noteworthy letter of its genome with other people (though their log-scaled bar plots downplayed this). And eventually, prodded by other voices, they conceded that we can learn and confirm some things about Lacks from HeLa’s DNA (or even RNA)[1] — but, they insisted, little if anything ‘new’.

And that’s truth…if not whole truth. For the authors apparently took learning something to mainly mean figuring out which of the 550 thousand or so novel (never before seen) variants they found had originally belonged to Lacks.[2] And yes, doing so without comparing HeLa’s genome directly to a non-HeLa genome from Lacks is indeed tough — at first glance, a genomic one-hand-clapping koan.

But even if identifying such variants really were the only route to learning something about Lacks, the authors ignored ways in which we can do so. Below, I’ll detail a few — which all boil down to thinking carefully about particular branches of the genetic family trees that link each segment of HeLa’s genome to the genomes of other cells and people.

Let’s start with novel variants that likely did not belong to Lacks.

Novel variants that likely arose only in the HeLa cell lineage

Single-copy variants in duplicated haplotypes

The HeLa genome is not easy reading. It’s basically a long, gibberish-spiked recipe for making tons of human cells. And if such a postmodern cookbook sounds riveting, you’ll be pleased to know that it’s written mostly in lawyerly triplicate.

In the decades after Lacks was born (perhaps in one big event, such as a faulty cell division), HeLa gained an extra copy of nearly every chromosome. And, though the new paper missed the next point, HeLa’s extensive triploidy can help clarify which of its many distinctive variants arose by new mutation in the cell line, after Lacks was conceived (and perhaps after she died).

Basically, the duplication(s) that turned HeLa triploid effectively sprouted tiny new branches in the human genome’s family treetops, but within the HeLa genome itself. And some novel variants in HeLa can be confidently traced to those new branches, clarifying that they likely arose by mutation after Lacks was conceived.

Specifically, if a novel variant appears on just one of the three copies of a given chromosome in HeLa, and the surrounding sequence is otherwise nearly identical to just one of the two other copies, then the variant in question most likely arose by such new mutation. That is, Lacks likely wasn’t born with it. Similar logic can help us pick out likely new mutant variants in parts of the HeLa genome (such as long stretches of chromosomes 1, 5, and 11) that show even more than three copies.

Overall, to figure out which of the half-million or so heterozygous novel variants found in HeLa go into this bin, we’d need to systematically check which other variants they’re linked to on particular copies of particular chromosomes. With current lab methods, that’s hard — especially for novel variants, where we can’t turn to reference data from human populations — but feasible [Update: a new paper from Jay Shendure‘s group reports doing just that!].

Heterozygous mitochondrial variants

Mitochondria are our cells’ tiny furnaces. Each one houses a short, gene-crammed ring of DNA, attesting direct descent from a bacterium that, eons ago, slipped inside a bigger cell, to found a line of ancestors that we share mainly with plants, molds, and other animals.

Because all your mitochondria come from the few that your mother’s egg carried, they tend to be nearly identical within each cell, and throughout your body. That is, though their genomes do mutate (so can vary a bit from mitochondrion to mitochondrion), any resulting variation tends to wash out when we read a person’s DNA en masse.

As such, in a person’s mitochondria, we expect to find very few genomic sites — typically zero, in some people one — that look heterozygous (coming in two or more distinct flavors within a given genome) in the classic genetic sense. In fact, if we see more than one such site, we can guess fairly strongly that something has gone wrong either with sequencing itself (such as sample cross-contamination), or with the genome we’re looking at.

The HeLa researchers didn’t report how many of the novel mitochondrial variants, if any, appeared heterozygous in the HeLa genome. But if any did, it’s a good bet that most or all of them arose long after Lacks was born. And though this bin is likely very small, it’s one that can be fairly confidently assessed, in part by picking through raw sequence data more carefully; as such, it might tell us a bit about the point mutation rate in HeLa, relative to other tumor cell lines.

Variants that don’t turn up in other HeLa lines

This is the first big, obvious bin that will eventually subsume many of HeLa’s currently puzzling variants. Many variants that appear in the recently sequenced HeLa line will turn out not to appear in most or all other lines that we might eventually look at. While this can reflect loss of such variants, by deletion of segments of DNA in some HeLa lines, it more often will reflect gain of the variant in question in this particular line.

As such, fleshing out the genetic family tree of other HeLa lines will help clarify which such variants trace specifically to the branch that the new paper focuses solely on. Now, on to the other class of novel variants that interest us: those that likely belonged to Lacks from the start…

Novel ancestral variants

In evolutionary parlance, an ancestral variant is, specifically, a variant carried by the last common ancestor of everyone in a population, for a given stretch of the genome, before any mutation there gave rise to a new variant that’s now found in some but not all individuals.

Most ancestral variants are still common among us, so were discovered long ago. But the haul of novel variants found in a given person may include the odd ancestral one that, til now, had never been hooked by researchers fishing in our gene pool.

Most such variants can be spotted easily, because they’re shared with many other mammals. Sometimes, such matches just reflect homoplasy – that is, mutation that happened to yield a variant already found in other organisms, perhaps at sites that tend to mutate a lot (such as Cs before Gs). But the commoner case — and the simpler guess, given how rarely mutation’s lightning strikes a typical site in the genome (and how rarer still it is for the mutation to happen to yield a particular variant seen elsewhere too) — is that a variant shared with most other mammals is really ancestral.

As such, among the novel variants called in HeLa, a few will confidently look ancestral — and most of these were, very likely, inherited by Lacks from her parents.

Whether ancestral or otherwise, novel variants tend to be rare — so rare that, by definition, they’d never been seen in a person before. As such, a given novel variant in your genome is very unlikely to also be carried by a randomly (er, carefully) chosen mate. And even if that person does carry a copy of the same very rare variant, there’s rarely more than a one in four chance that a child you’d have together will get two copies of it (one from each parent).

As such, nearly all novel variants found in your genome (or anyone’s) are heterozygous – that is, found on just one of your two copies of a given chromosome. Nonetheless, a smattering of novel variants may instead be homozygous, by the jackpot luck described above.

The HeLa genome reportedly harbors roughly 50 thousand homozygous novel variants — and that’s quite a lot, compared to most human genomes. Given that Lacks’s parents weren’t close relatives, I’d guess that many of these are in long stretches of the HeLa genome (such as most of chromosomes 6 and X) that look uniformly homozygous, likely due to loss of heterozygosity, where HeLa gained or lost copies of the given genome segment. Within these segments, and in genes that likely help govern HeLa’s growth, the origins of homozygous novel variants are hard to guess. Some may indeed have popped up by mutation in HeLa lineage, then become homozygous by secondary mutation. But others may have been in Lacks’s genome from the start.

But outside such regions, most homozygous variants found in HeLa — novel or otherwise — likely belonged to Lacks from the start. And we’ll eventually be able to confirm many of these guesses by looking at the next bin, below…

Variants that turn up in other people

This is the other big, obvious bin that will eventually subsume many of HeLa’s currently puzzling variants (the converse of the variants-that-don’t-turn-up-in-other-HeLa-lines bin described above). Of the 550-thousand-odd variants so far seen only in HeLa, many thousands will likely eventually be found in the genomes of living people whom we’ve yet to sequence. And most of these — especially if flanked by stretches of sequence that distinctively resemble the sequence flanking them in HeLa — will indeed be variants that Lacks was born with (rather than variants that arose in parallel by new mutation).

As such, it’s just a matter of time before we can trace many of the novel variants in HeLa to particular branches of the bigger human genetic family tree.

Importantly, some of these variants will, in turn, become functionally informative, once we’ve seen them enough times to see what kinds of traits people who carry them may distinctively share. Looking ahead, this is where HeLa’s genome will eventually show similarities to other genetic portraits of individuals, which in turn refine not just our picture of Lacks’s genome, but of what she was like as a person, inside and out.

Further thoughts

On first hearing that HeLa had been publicly sequenced, I smiled — a bit wistfully — and recalled stumping, in 2009, to do just that[3]. If you’re working at a genome interpretation company, and want to pick just one human(oid) genome to sequence for public good, you could do far worse than HeLa. It’s a ubiquitous cell line, after all, whose genome says a bit about cervical cancer; a bit more about how lab-farmed cells find ways to thrive; and, yes, a lot about Henrietta Lacks, who, suffering in rough conditions in a segregation-era cancer ward, made an unwitting, coerced, but remarkable bequest to human knowledge and health.

In these senses, the HeLa genome could anchor an engaging public panel discussion among geneticists, oncologists, ethicists, and, crucially, members of the Lacks family. Glibly, it might start with a bird’s eye look at cancer as doomed cellular mutiny. Next, meeting Henrietta herself through family memory, we’d learn how her particular tumor eluded such doom by appealing, ironically, to humanity’s needs. That notion raises the thorny ethics of using a useful but unethically obtained resource (with no direct recompense to the family who suffered most in giving it to the world), and in turn opens the meta-wormcan of genomic privacy. And new insights from the HeLa genome itself help weave these threads into emerging genomic portraits of the cell line and the little-sung woman it traces to.

As noted, such a discussion would most enlighten if it directly included members of the Lacks family, who could reflect personally on Henrietta, HeLa, and genetic privacy — and who embody how segments of her genome also live on, most vitally, in the kin she loved and would have loved.

As that book makes clear, the HeLa saga has long blended travesty, triumph, and quandary in roughly equal parts. The debate over its genome is thus just a new eddy of an old storm. The last few months, like the previous 71 years, have seen useful data tainted by heedless ethics; public efforts to mend, if not undo, a wrong; and, perhaps most pressingly, lasting questions about how we should use scientific resources whose provenance is, at best, murky.

We haven’t dwelled here on how, in bypassing the Lackses, the first HeLa genome paper furthered a history of disregard for real people who carry forth Henrietta’s DNA, and who mourn her most deeply. Nor have we tried to define what genetic privacy means, given that far more of any genome is complicatedly shared — with kin, both known and unknown — than is truly private.

Rather, we’ve focused on the question of what we can indeed learn from a single genome, which matters for three main reasons.

First, the question cuts across the authors’ initial claim that the genome of HeLa (read: any tumor or cell line) told us nothing about Henrietta Lacks (read: any person from whom such cells are taken). That technical claim doesn’t hold up, as we’ve seen.

Second, it points, more broadly, to what we can, and cannot, reasonably hope to learn today about a person just by looking at a genome. The paper itself confirmed, via HeLa’s genome, that Lacks was indeed a woman with largely west African and European ancestry. And some of us noted that we could go deeper, using public data to guess which variants were attached to which others in Lacks’s chromosomes, ultimately clarifying which parts of the world particular stretches of her genome trace to, and even which particular people are her close cousins in particular parts of the genome.

Moreover, the paper’s authors soon acknowledged that well studied variants that HeLa shares with other human genomes can let us guess at particular traits that Lacks may have had, and at various disease risks that she might have faced, had she not died of cervical cancer at thirty-one. But the bottom line on this front is that there’s still much we can’t guess from looking at one genome — even the best sequenced one. As more people’s genomes and traits are surveyed over time, however, the HeLa genome will indeed reveal deeper insights about who Lacks was. Facets of her uniqueness — voice and gait, quirks, ailments, and interests — may someday be traceable, in part, to distinctive genomic spellings that for now hide mum in plain sight.

Third, the technical question of what we can learn raises thornier policy questions about data privacy, ownership, publishing, and consent. And here, perhaps, likening HeLa’s genome to a remixed or cubist rendition of Lacks’s was apt. A person’s DNA uniquely quilts together genome segments shared with others (kin, both near and far), peppered with a few brand new variants (by mutation). As such, it’s neither fully original, nor fully derivative — like a work of art, necessarily influenced by broader culture. And, perhaps notably, tough questions about sampling and provenance (Elgin marbles, anyone?) have long raged in the art world, with few clear answers. As such, prepare for a vocal, but likely indefinite societal conversation on that front.

[^1] The paper itself made this clear by showing, from HeLa’s genome, that Lacks was indeed a woman with largely west African and European ancestry. Some of us noted that we could go deeper, using public data to guess which variants were attached to which others in Lacks’s chromosomes, clarifying which parts of the world particular stretches of her genome trace to, and even which particular people are her close cousins in particular parts of the genome.

Moreover, some common variants that HeLa shares with other human genomes let us guess at traits that Lacks may have had, and at various disease risks that she might have faced, had she not died of cervical cancer at thirty-one.

[^2] In this sense, the HeLa genome controversy echoes recent debate over ENCODE, an effort to survey which parts of our genomes govern how much of each protein gets made, where, and when. The ENCODE controversy has hinged, in part, on what the word functional means; for HeLa, the key question is what it means to ‘learn nothing’ about a genome or person.

Note here that the HeLa authors might likewise have gone out of their way to tell us that we can learn ‘nothing about cell lines’ from the HeLa genome. After all, it’s -possible- (though astoundingly unlikely) that the whole HeLa genome — triploid chromosomes and all — is exactly the same genome, letter for letter, that Lacks was born with.

[^3] While HeLa’s genome long intrigued me and others, several technical, funding, and strategic hurdles thwarted the idea back then. And here it bears note, at risk of sanctimony, that our brainstorming on HeLa was always premised on the blessing and direct participation of the Lacks family; had the project been green-lighted internally, step one would have been reaching out to them. So when I learned that the authors of the HeLa paper hadn’t engaged the Lackses, my heart, like others, sank…and when I then read the authors’ original FAQ, which protested (too much) that their data said nothing about Lacks herself, my gorge rose, prompting this post.

Update (7 August 2013):

Lacks’s descendants will reportedly anchor a new committee to consider, case-by-case, requests to use HeLa genome data in biomedical research. Brokered by the family, Skloot, Francis Collins, and others, the arrangement won’t make the HeLa data fully open (so most useful), keeping it instead in NIH’s credential-walled dbGAP. Nonetheless, it’s a gracious and scientifically substantive gesture by the family, whose preference understandably trumps other factors.

HeLa was, after all, derived surreptitiously from their own foremother (and, if all her descendants agreed, no one else’s) without informed consent. That wrong, along with the decades more of scientific ill-treatment of the Lackses that followed, remain the heart of the matter, even if the HeLa saga also highlights

broader dilemmas around publishing genome data from explicitly named people, given that doing so makes it much easier to guess who else (their identifiable kin) likely carries many of their distinctive genetic spellings too.

more hearteningly, the potential benefits of directly engaging layfolk as partners in interpreting data they give for societal benefit

For now, here’s to new discoveries that the soon-to-be accessible data will unlock, including those about Lacks herself. Happily that’s starting already: the data access plan itself was announced alongside a new paper dissecting the HeLa genome in unprecedented detail — notably, with phased haplotypes that should, as noted above, help us better guess which variants Lacks was indeed born with, and which arose only in the cells that became her tumor.