The ENCODE Project and the ENCODE Controversy

The ENCyclopedia Of DNA Elements (ENCODE) project was an international
research effort funded by the National Human Genome Research Institute
(NHGRI) that aimed to identify all functional elements (FE) in the
human genome (ENCODE Project Consortium 2004). FEs include, for
instance, protein-coding regions, regulatory elements such as
promoters, silencers or enhancers and sequences that are important for
chromosomal structure. The project, which began in 2003 and included
442 researchers during its main production phase, came to a conclusion
in 2012 with the publication of 30 different papers in different
journals (ENCODE Project Consortium 2012; Pennisi 2012). Similarly to
the HapMap project, ENCODE was presented as the logical next step
after the sequencing of the genomic DNA, since tackling the
interpretation of the sequences was now seen as the top
priority (ENCODE Project Consortium 2004).

The ENCODE project incited a heated debate in academic journals, the
blogosphere and also in the national and international press. The
crucial claim that incited much ire was the project’s conclusion
that 80.4% of the human genomic DNA has a ‘biochemical
function’ (ENCODE Project Consortium 2012). To understand the
strong reaction this statement provoked we have to turn our focus
again to the C-value paradox and the concept of ‘junk DNA’
(see
Section 2.3
of the main text). In the context of the ENCODE controversy this
debate was linked with the issue of how to define a ‘functional
element’ and how scientists ascribe functions in biological
systems. What the ENCODE research implied, at least in the eyes of
some commentators, was that the idea of junk DNA was proven wrong,
because almost all of our DNA turned out to be functional. This led to
claims that textbooks will have to be re-written, as they still
describe the genome as mainly composed of
junk.[S1]
The defenders of the old view claimed that the ENCODE researchers set
far too low a bar in ascribing functions to elements of biological
systems.

The Methodology of the ENCODE Project

The ENCODE project used a range of different experimental assays to
analyse what they referred to as ‘sites of biochemical
activity’ (for an overview of the ENCODE output see Qu &
Fang 2013). These are sites at which some sort of modification can be
identified (for instance methylation) or to which an activity (such as
transcription of DNA to RNA) can be ascribed. These modifications or
activities were taken as strong indications that the identified
regions of the genomic DNA play a functional role in human cells.

As an example of how this approach worked, ENCODE researchers were
interested in finding out how much of the genomic DNA is involved in
the regulation of gene expression. Researchers postulate that a key
hallmark of all regulatory DNA elements is their accessibility. This
makes sense as the regulatory and transcriptional machinery need
access to these DNA sites. ENCODE used this feature of regulatory DNA
to map (putative) regulatory elements in the human genome. One way to
do so is to perform what is called a ‘DNase I hypersensitivity
assay’. DNase I is a protein that can cut DNA and this cutting
process works better when the template DNA is accessible, meaning that
highly accessible regions are more sensitive to DNase I activity. The
behaviour of the genome in the DNase I hypersensitivity assay can
therefore be used to learn indirectly about its structure, from which
researchers then infer the presence of a functional element (in this
case a regulatory sequence). This is just one example of about 24
different types of assays that ENCODE researchers used to get a better
insight into the number and distribution of functional elements in the
human genome (for a discussion of the different types of experimental
approaches used in ENCODE see Kellis et al. 2014).

What is interesting about most of these assays is that they look at a
proxy for function: if a stretch of DNA is hypersensitive to
DNase I then it is automatically defined as functional. Another
example is DNA transcription itself. If a DNA sequence shows up in RNA
sequencing then this means it has been transcribed into RNA by the
enzyme RNA polymerase. This activity, in the eyes of the ENCODE
researchers at least, makes the DNA element in question a functional
element of the genome.

But such a broad approach to finding out about functional elements is
highly problematic, as a transcription event or hypersensitivity can
be present for many different reasons (for instance as a result of
transcriptional noise). This is exactly what some critics of ENCODE
homed in on, pointing out that merely showing the existence
of a structure (such as methylation) or a process (such as
transcription) is not enough by itself to prove any functional
significance of these biochemical features (Doolittle 2013; Eddy 2012;
Graur et al. 2013; Niu & Jiang 2013).

Whilst this is surely a valid point that applies to a large part of
the research done within ENCODE, not all studies performed as part of
the project looked at such proxies. An example is (Whitfield et al.
2012), who did not just look at specific modifications or behaviour of
DNA in particular assays but mutated specific sites to check whether
the interference with these sites has an effect on gene
expression.

The above argument about how we learn about functional elements
presumes that we already have an understanding of what it means to be
‘functional’. But it is by no means clear how biological
function should be defined and there are competing accounts of what it
is that makes an entity functional. These discussions about the
concept of a biological function were central to the dispute
surrounding the ENCODE project.

The ENCODE Controversy

Especially in the early critiques by Doolittle (2013) and Graur et al.
(2013) the distinction between ‘selected effect’ (SE)
function and the causal role (CR) function of an entity or process
figured prominently. The ENCODE project, so the critics, simply
ignored key work by philosophers and theoretical biologists on this
topic, thereby making a complete muddle of what they are talking about
when they use the term ‘function’. With more conceptual
clarity, they argued, the claim that 80% of our DNA is functional
would not be tenable and the established notion of ‘junk
DNA’ would be saved.

The definition of function and functional analysis in biology deserves
an SEP entry of its own. Here we will limit ourselves to a few
comments on this issue that relate to the ENCODE controversy
specifically. The key point is that SE functions are functions that
are assigned to conserved sequences. The SE account aims to
answer the question of why an element is there: a functional element
according to this definition is an entity whose presence has a
positive effect on the survival or reproduction of the organism;
meaning the entity has been selected for (Millikan 1984, 1989a;
Neander 1991; Griffiths 1992, 1993; Godfrey-Smith 1994). If a gene has
been selected for it is expected that its sequence will be conserved:
mutations within it will be selected against, and will be less
frequent than in sequences that are not being maintained by selection.
History matters for this account, which is also why it is sometimes
referred to as the etiological account of function, going back to a
paper by Wright (1976) but see Millikan (1989a) on how the etiological
account relates to Wright’s original account).

CR functions on the other hand do not depend on the history of the
system. What the CR account answers are ‘how’ questions in
relation to the capacities of a system (Millikan 1989b). It is only
the here and now that matters for the CR account: functional analysis
is about analysing a system with capacity C into sub-capacities
that are attributed to elements of the system and which contribute to
C (Cummins 1975). The CR account is in an important sense more
liberal than the etiological account: according to CR anything can be
deemed a functional element as long as it is part of a system and
plays some causal role contributing to some system capacity we happen
to be interested in.

Graur et al. (2013) claim that ENCODE worked with the CR account but
that this is a mistake, as biologists actually work with the SE
account, a claim that can also be found in Doolittle 2013. They
acknowledge that biologists might study CR functions (for instance
when doing deletion experiments) but claim that even if scientists do
so they take these causal roles simply as indicative of SE function,
which is the ‘true’ function of a biological element
(Doolittle 2013; Graur et al. 2013). It is with this focus on SE
functions that these critics bring us back to the C-value paradox and
the strong case one can make for the importance of the junk DNA
concept.

The deep problem is that there is simply not enough conserved DNA in
humans to match the high percentage of functional DNA the ENCODE
project came up with. Accepting current estimates that between 5 and
10% of the human genome is conserved (Lindblad-Toh et al. 2011; Ward
& Kellis 2012) then there is clearly no correlation between the
amount of sequence under evolutionary constraint and what is called
‘functional’ by the ENCODE consortium. Graur et al. (2013)
call this the ‘ENCODE
incongruity’.[S2]

As already pointed out above this critique is based on a claim about
which functional account is actually used by scientists. This appears
to be taken as an empirical claim, though perhaps what matters more is
how scientists ought to understand functional language. This,
in turn, is likely to depend on what their aims are. Either way, this
is an important point, because once we think in terms of SE functions
DNA conservation immediately becomes salient. If, however, it turns
out that scientists don’t (or shouldn’t) use the SE
account (as is claimed, for instance, by Elliott et al. 2014; Germain
et al. 2014; Amundson & Lauder 1994; Griffiths 1994, 2006) this
critique loses much of its force as the ENCODE incongruity is no
longer a problem.

This is exactly the point on which a recently published critique of
the critics picks up. (Germain et al. 2014) claim that the critics of
ENCODE simply misunderstood the nature of the project, as they did not
take into account that the ENCODE project was part of a biomedical
discovery process. As such the project was concerned to find out about
elements of the human genome that might engage in relevant
biochemical processes. What makes a sequence or activity relevant
in the biomedical context it is not whether it is conserved but
whether its absence or presence has a potential effect on activities
or entities that are of relevance to biomedical research. The CR
account, Germain and co-workers claim, is therefore the right account
to use in this context and the ‘ENCODE incongruity’ is no
longer a relevant issue.

In all of this the ENCODE researchers themselves did not stay silent.
It is interesting to note that in a reply to their critics key ENCODE
members toned down their claims about the percentage of functional
elements present in the human genome – the 80.4% number is not
mentioned again (Kellis et al. 2014). In fact, no numbers are
mentioned in this paper and the authors remark that in their opinion
creating an open access resource (i.e., the ENCODE library) is
“far more important than any interim estimate of the fraction of
the human genome that is functional” (2014: 6136). The authors
also point out that in their eyes all experimental and theoretical
approaches to functional ascriptions have their limitations and that
no account or assay will get it right on their own, which is why they
advocate both a methodological and theoretical pluralism, again
defusing many of the stronger claims made earlier on both sides of the
dispute.

The issue (Germain et al. 2014) raise concerns the type of
scientific project ENCODE is. As also in the context of the HapMap
project, we encounter the long-standing dispute about the value of
hypothesis-free or exploratory research (see
Section 3.1.3
of the main text). Eddy (2013), for instance, laments that the
project was originally a mapping project but was then spun
retrospectively as a project that aimed at testing a hypothesis. Graur
et al. (2013) also make the point that ENCODE overstepped their remit
of a big science project - which, they claim, is simply to provide
data – and that the ENCODE researchers ventured into
‘small science’ territory by trying to deliver an
interpretation of that data. In contrast to the criticisms the HGP
originally encountered, these modern-day critics don’t have a
problem anymore with the idea of a descriptive mapping
project; their worry is rather that the project is sold as something
it isn’t.