Abstract

Chronic lymphocytic leukemia (CLL) has been consistently at the forefront of genetic
research owing to its prevalence and the accessibility of sample material. Recently,
genome-wide technologies have been intensively applied to CLL genetics, with remarkable
progress. Single nucleotide polymorphism arrays have identified recurring chromosomal
aberrations, thereby focusing functional studies on discrete genomic lesions and leading
to the first implication of somatic microRNA disruption in cancer. Next-generation
sequencing (NGS) has further transformed our understanding of CLL by identifying novel
recurrently mutated putative drivers, including the unexpected discovery of somatic
mutations affecting spliceosome function. NGS has further enabled in-depth examination
of the transcriptional and epigenetic changes in CLL that accompany genetic lesions,
and has shed light on how different driver events appear at different stages of disease
progression and clonally evolve with relapsed disease. In addition to providing important
insights into disease biology, these discoveries have significant translational potential.
They enhance prognosis by highlighting specific lesions associated with poor clinical
outcomes (for example, driver events such as mutations in the splicing factor subunit
gene SF3B1) or with increased clonal heterogeneity (for example, the presence of subclonal driver
mutations). Here, we review new genomic discoveries in CLL and discuss their possible
implications in the era of precision medicine.

The knowns and unknowns in the biology and treatment of chronic lymphocytic leukemia

Chronic lymphocytic leukemia (CLL) is a low-grade B-cell malignancy, characterized
by the accumulation of mature CD5+/CD19+/CD23+ lymphocytes with weak surface expression of a monoclonal immunoglobulin (Ig) [1] in the peripheral blood, bone marrow, lymph nodes and spleen. It is diagnosed either
incidentally (with an abnormally high white blood cell count) in asymptomatic patients,
or due to symptoms that result from cytopenias, adenopathy or constitutional symptoms,
as outlined by the 2008 International Workshop on CLL [2]. CLL is part of a spectrum of pathological conditions involving clonally proliferating
B cells. It is thought to be preceded by monoclonal B-cell lymphocytosis (MBL), a
state in which a smaller size B-cell clone is present, typically in the absence of
symptoms [3]. At the other end of the spectrum, CLL may transform into a higher-grade malignancy,
a process termed Richter's transformation, which is often associated with a dismal
clinical outcome [4].

CLL possesses several features that place it at the forefront of cancer genetic research.
First, it has high relevance as the most common leukemia in adults [5]. Second, the ability to easily procure primary tumor cells from the bloodstream facilitates
the application of cutting-edge genetic methodologies. These technologies have been
used to define the underlying biology of CLL (for instance, elucidating the cell of
origin of this lymphoid malignancy [6]), as well as to explore clinical questions (such as how to predict clinical outcome
in a highly variable disease on the basis of molecular indicators [7]). These investigations have yielded striking insights, including the first identification
of a causative somatic microRNA alteration in cancer [8], as well as one of the first effective molecular prognostic schemes [9].

In parallel, there has been marked progress in the development of therapeutic options
in CLL (extensively reviewed elsewhere [10-12]). While the general therapeutic paradigm in CLL remains based on the 'watch and wait'
approach (that is, treatment is initiated only when symptoms occur) [13], clinicians now have an extensive array of effective options when treatment is required.
For example, combination chemo-immunotherapy with fludarabine, cyclophosphamide and
rituximab has yielded excellent long-term results [14]. Additionally, immunotherapy-based therapeutics such as alemtuzumab [15] and allogeneic stem-cell transplantation [16,17] have been demonstrated to provide effective disease control in treatment-refractory
or high-risk patients. Importantly, as CLL often affects elderly individuals, more
tolerable therapeutic approaches have been successfully applied, such as lenalidomide
[18] and bendamustine-based regimens [19]. Most recently, therapies targeting the B-cell-receptor signaling pathway, such as
ibrutinib, have generated excitement as they have shown promising efficacy and tolerability
in phase II clinical trials [20].

Despite the expansion of therapeutic options for CLL patients, which has improved
patient survival, CLL remains largely incurable, and its course is difficult to predict.
Furthermore, guidance about appropriate treatment selection on the basis of individual
genetic and molecular abnormalities remains limited [21]. A full characterization of the CLL genomic landscape would enable several questions
to be addressed. Can we accurately predict the course of the disease? Can we predict
which patients will respond to which therapies? And can we use genomic information
to target the therapy to the underlying genetic or other alterations? Over the past
two years, genomic approaches have been intensively applied for studying this disease
and have aided us in answering these important questions (Figure 1). Here, we review the main findings of these investigations as well as their possible
biological and clinical implications, focusing on key findings obtained by genomic
technologies, such as the expanded compendium of somatic gene alterations and the
characterization of clonal evolution and of the epigenetic landscape of CLL.

Figure 1.In recent years, CLL has been investigated through the use of several novel genomic
technologies. CLL is a disease of mature B cells that is typically present in high abundance in
blood; a typical peripheral blood smear is shown in the top panel. The typical source
material used for these studies is primary peripheral blood CLL samples. Four main
genomic approaches have been applied to this disease, including whole-exome/genome
DNA sequencing, SNP arrays for copy number measurement, RNA sequencing and analyses
of DNA methylation. These studies have added a substantial amount of information regarding
the biology of CLL. CLL, chronic lymphocytic leukemia; LOH, loss of heterozygosity;
SNP, single nucleotide polymorphism.

Somatic copy number alterations

The study of somatic copy number alterations (sCNAs), which are somatically acquired
alterations of a genome that result in the cell having an abnormal number of copies
of one or more sections of DNA, has revealed a high degree of molecular heterogeneity
in CLL (reviewed extensively elsewhere [6,7,22]). Briefly, unlike other lymphoid tumors such as follicular lymphoma or diffuse large
B-cell lymphoma, CLL is not characterized by a common translocation involving the
Ig loci, but instead by specific recurrent sCNAs (such as chromosome 11q deletions
(del(11q)), trisomy 12, del(13q) and del(17p)) that have been observed using comparative
genome hybridization [23] and single nucleotide polymorphism (SNP)-array-based investigations [24] (Table 1). Considering the near-diploid genome of CLL (only a small number of sCNAs are typically
observed in CLL), these are probably causative events, as the finding of highly recurrent
events against a backdrop of a low background sCNA rate testifies to significant selection
and hence to a significant fitness advantage afforded to CLL cells by these lesions.
Furthermore, they affect clinical outcome [9]: del(13q) is associated with a good prognosis whereas del(11q) and del(17p) are associated
with a poor prognosis with present-day chemo-immunotherapy approaches. Lower frequency
lesions have also been identified involving the MYC locus [25], the short arm of chromosome 8 [23], and lesions probably affecting PIK3CA, NFKB2 and MGA [26,27]. Allele-specific copy number quantification with SNP arrays has also enabled the
discovery of frequent copy-neutral loss of heterozygosity in CLL, often resulting
in biallelic hits (mutations or epigenetic alterations) in key CLL-related loci, and
therefore potentially altering function [24]. For example, duplication of the allele containing the small del(13q) event may be
concurrent with the loss of the sister normal allele.

By measuring the affected portion of chromosomes across many CLL patient samples,
and thus defining the size of minimally affected lesions, these methodologies have
contributed to a mechanistic understanding of causative lesions in CLL. For instance,
the minimal deleted region in del(13q14) focused functional investigation onto a small
number of genetic elements, and ultimately led to the discovery that the microRNAs
miR-15a and miR-16-1, encoded by an intron of DLEU2, have a causative role in CLL [8], perhaps through the release of the anti-apoptotic BCL2 protein from microRNA-mediated
downregulation [28]. More recently, the case for miR-15a/16-1 deletion having a causative role in CLL was strengthened with the generation of a
CLL mouse model based on knockout of this locus [29]. Significant variation in the size of the deleted region (from approximately 300
kilobases to more than 50 megabases) provides clues to additional contributing genetic
components [30]. For example, adjacent hits within large monoallelic deletions (affecting, for example,
the RB1 gene) may have an important contributory role compared with a more isolated effect
of the disruption of the microRNA cluster in the shorter biallelic deletions. While
del(11q) and del(17p) impact the cellular network primarily due to the deletion of
known tumor suppressor genes ATM and TP53, respectively, the mechanism by which trisomy 12 contributes to lymphoproliferation
remains unknown [7]. This is due in part to the large size of the affected lesion (an entire chromosome),
which limits the ability to focus investigations on a smaller number of genes; application
of large RNA interference screen-based approaches, however, may reveal candidate genes.

Clinical application of this information yielded one of the earliest molecular classification
schemata in cancer, predicting the course of disease based on the identity of the
sCNA [9]. This is of particular importance in a disease like CLL where the clinical heterogeneity
is enormous, with some patients remaining stable without treatment for years or even
decades, while others follow a fulminant and treatment-refractory course. Higher genomic
complexity - the presence of a high number of sCNAs - has also been associated with
worse outcome, including shorter time to first therapy and lower overall survival
rate [31,32]. Nevertheless, in contrast to other tumors, CLL has a relative paucity of sCNAs [26]. This observation has led to the suggestion that somatic single nucleotide variants
(sSNVs) and indels could play an important role in the pathogenesis of CLL, paving
the way for the application of next-generation sequencing (NGS) technologies to this
disease.

The genomic landscape of CLL probed with next-generation sequencing

NGS studies of the CLL genome [33,34] have effectively elucidated the level of genomic complexity in CLL, and have revealed
that the average number of non-silent mutations (that is, mutations that alter the
protein sequence) is 10 to 20 per each sequenced CLL sample (out of approximately
1,000 somatic mutations per sample detected genome-wide). This is at least an order
of magnitude lower than the number of lesions detected in the coding genomes of common
epithelial cancers, such as lung cancer or melanoma [35]. Even among hematologic malignancies, the genomic complexity of CLL is relatively
low, similar to that of acute leukemias [36]. The overwhelming majority of sSNVs involve C>T transitions at CpG sites, with some
differences in mutation patterns between CLL with mutations in the Ig heavy variable
region (IGHV-mutated) and CLL lacking IGHV mutations (IGHV-unmutated), suggestive of the involvement of aberrant somatic hypermutation with
error-prone repair [33]. Importantly, the number of mutations in CLL samples from patients who received chemo-immunotherapy
before sampling is not significantly increased [34]. These results suggest that, unlike several other cancers such as glioblastoma [37], CLL treatment does not substantially contribute to increased mutagenesis.

NGS has also uncovered an unusual form of genomic complexity in CLL, termed chromothripsis,
which results from a massive genomic rearrangement event within a single region through
an as yet unknown underlying mechanism [38]. Overall, chromothripsis was detected at a substantial frequency in CLL (approximately
2%) through inference from SNP-array data, and was seen almost exclusively in CLL
with IGHV-unmutated status and with mutated TP53. This observation suggests that although genome integrity is largely preserved in
CLL (as demonstrated by its typically near-diploid genome), catastrophic rearrangements
can be tolerated and selected within a permissive genetic context. Perhaps unsurprisingly,
chromothripsis has been associated with a worse prognosis [27].

Beyond the characterization of the mutational landscape in CLL, NGS has also been
used to study, in an unbiased fashion, recurrent genetic alterations in CLL. Putative
driver mutations, which are genetic lesions that are likely to confer a significant
fitness advantage, have been identified (Tables 2 and 3). The first studies reported whole-genome [33] or whole-exome [39] sequencing of a handful of CLL samples, followed by targeted sequencing of coding
mutations detected in these samples in larger validation cohorts. This approach uncovered
several important putative drivers, including MYD88 and NOTCH1 mutations. An alternative approach using a larger initial cohort probed with whole-exome
sequencing has enabled the discovery of a larger number of putative drivers [34,40]. Collectively, these studies have demonstrated wide heterogeneity in the genetic
lesions driving CLL transformation and progression, characterized by 'mountains' (that
is, highly recurrent genes such as TP53) and 'hills' (infrequent but still statistically significant recurrent genes such
as XPO1), as seen in other sequencing efforts [41].

One of the earliest CLL drivers identified through NGS was NOTCH1 [33,34,39]. NOTCH1 encodes a ligand-activated transcription factor that regulates several downstream
pathways important for the control of cell growth. One recurrent mutation (c.7544_7545fsdel)
accounts for approximately 80% of all NOTCH1 mutations and generates a premature stop codon in the PEST domain (a peptide rich
in proline (P), glutamic acid (E), serine (S) and threonine (T), thought to act as
a signal for protein degradation [42]), which normally limits the intensity and duration of NOTCH1 signaling [39]. Disruption of the PEST domain results in impaired NOTCH1 degradation, as it interferes
with phosphorylation of the PEST domain of the receptor and its proteasomal degradation
through the FBXW7-SCF ubiquitin ligase complex [43]. This in turn results in accumulation of an active NOTCH1 isoform, which is associated
with a distinct transcriptional signature [33]. In CLL, the frequency of NOTCH1 mutations is above 10%, and tends to occur in CLLs without IGHV mutation and with trisomy 12 [44], although it is important to note that the latter association was not found in another
recent study [45]. In some studies, the presence of NOTCH1 mutations provided independent prognostic information and identified a group of patients
with intermediate-risk disease [46] and those in whom CLL was more likely to transform into high-grade lymphoma [47]. However, the effect size may not be as prominent as other CLL prognostic indicators,
as further studies failed to show an independent prognostic value for the presence
of these mutations [47,48].

Another commonly mutated gene is MYD88, a critical adaptor molecule of the Toll-like receptor (TLR) complex [33,34], seen in 3 to 8% of CLL cases. After TLR stimulation, MYD88 is recruited to the receptor
as a homodimer and forms a complex with IRAK4, leading to activation of IRAK1 and
IRAK2. This then leads to the downstream activation of TRAF6 and ultimately to phosphorylation
of IκBα and activation of the central B-cell transcription factor, nuclear factor
(NF)-κB [49,50]. The recurrent MYD88 mutation in CLL (L265P) imposes constitutive MYD88-IRAK signaling even in the absence of ligand-receptor binding, and thereby provides constitutive
NF-κB activity. Of note, MYD88 L265P mutations have been found exclusively in CLL with mutated IGHV. Exactly the same mutation has been identified in other malignancies of mature B
cells such as diffuse large B-cell lymphoma [51], central nervous system lymphoma [52] and Waldenström's macroglobulinemia [53]. Furthermore, this aberration is potentially amenable to therapeutic targeting through
direct inhibition of the MYD88-IRAK complex, through proteasomal inhibition [54] or even through the inhibition of Bruton's tyrosine kinase (BTK) [55].

Putative drivers can be further categorized based on the cellular pathways they involve.
Recurrently mutated genes in CLL can be grouped into seven core cellular networks,
in which the genes play well-established roles. As shown in Figure 2, these include DNA repair and cell-cycle control, Notch signaling, inflammatory pathways,
Wnt signaling, RNA splicing and processing (found to be present in close to one-third
of CLLs [56]), B-cell receptor signaling and chromatin modification. Pathway analysis may also
be beneficial to detect commonly disrupted pathways that may be of high biological
relevance but that do not contain a single highly recurrent gene, and may be missed
by gene-centric analytic approaches. One such example is disruption of the Wnt pathway
[34], a key player in CLL biology [57,58].

Figure 2.Affected genes in CLL discovered through genomic sequencing studies can be grouped
into seven core cellular pathways. Genes recurrently mutated in CLL samples are shown in red ovals, while genes found
to be mutated in isolated samples but which did not reach statistical significance
are shown as pink ovals. Affected cellular elements include four signaling pathways
with a known role in B-cell biology: inflammatory pathways, B-cell receptor signaling,
Notch signaling, and Wnt signaling. Notch and Wnt signaling both provide important
pro-survival input for CLL cells, allowing them to evade apoptosis [115-117]. In addition, they serve as an important bridge with the microenvironment, which
is of particular importance in CLL, as manifested by relatively poor cell survival
outside of the endogenous niche (for example, in in vitro or in vivo animal models) [118]. BCR signaling and inflammatory pathways may serve similar functions, and in addition
may form optimal early targets for somatic mutations as they hijack physiologically
active cellular pathways in relatively differentiated B cells [75,119]. In addition, three intranuclear processes are involved, including DNA repair, chromatic
modification and RNA processing. Although the role of DNA repair disruptions has been
extensively investigated, with multiple effects on pro-survival circuits, growth and
genetic plasticity [120,121], the role of the other two intranuclear processes remains to be fully elucidated
in CLL. IC, intracellular; C, cytoplasm.

Although the unbiased approach of whole-exome sequencing of large cohorts is highly
effective at detecting putative drivers, it may still miss important drivers, either
owing to lack of power to detect lower frequency events or to the patient characteristics
of the investigated cohort. A striking example of such drivers is the case of BIRC3-inactivating mutations, which have not been detected in most of the large sequencing
efforts. Targeted sequencing of the BIRC3 coding sequence in CLL showed that BIRC3 inactivation is particularly common in fludarabine-refractory patients (24%) [59]. BIRC3, along with TRAF2 and TRAF3, cooperates in negatively regulating MAP3K14,
an activator of the non-canonical pathway of NF-κB signaling [60], and therefore BIRC3 mutations result in constitutive NF-κB activation [59]. Thus, BIRC3 mutations join SF3B1 (described in the next section), NOTCH1 and TP53 as mutations that contribute to chemo-refractoriness [61]. This example highlights the need to include specific patient groups in sequencing
efforts. Furthermore, it supports the idea that driver landscapes of similar types
of malignancies can guide driver identification, as the study of BIRC3 in CLL was prompted by its discovery in splenic marginal zone lymphoma [62].

Spliceosome mutations are important driver events in CLL

One of the most unexpected and important findings arising from an unbiased NGS discovery
approach was the identification of SF3B1 as one of the most recurrently mutated genes in CLL [63]. SF3B1 is a central component of the U2 spliceosome, which orchestrates the excision of introns
from pre-mRNA to form mature mRNA [64]. Strikingly, SF3B1 mutations are found in 10 to 14% of CLLs, particularly in CLL without IGHV mutation [34,40]. This discovery coincided with the report of frequent somatic disruptions of the
splicing machinery in myelodysplastic syndrome [65], thereby marking a new important path to oncogenesis in hematological malignancies
[66-70], as well as in solid malignancies [71,72]. The identification of a recurrently mutated gene in both unmutated IGHV CLL and myeloid malignancies may hint at a role of dysregulated hematopoietic stem/progenitor
cells in some mature lymphoid malignancies [70].

The pathogenic role of SF3B1 mutations is not only supported by its frequent occurrence in CLL, but also by the
fact that mutations cluster in evolutionarily conserved hotspots within its carboxy-terminal
repeat HEAT domains, whose function remains unknown [34]. SF3B1 mutations potentially lead to a defective spliceosome complex that is incapable of
performing the correct splicing steps. It has been reported that CLL cells with SF3B1 mutations show defective splicing activity, with a high ratio of unspliced to spliced
BRD2 and RIOK3 mRNA, transcripts that have previously been shown to require SF3b spliceosome activity
[34,73]. Elevated levels of truncated mRNA of the transcription factor FOXP1 and additional proteins that are SF3b spliceosome targets have been reported in association
with SF3B1 mutation [40]. The precise mechanistic aspects of SF3B1 mutation, however, are still under investigation. Of note, in addition to SF3B1 mutations, disruptions of other aspects of RNA processing have been observed in CLL,
including recurrent mutations in DDX3X and XPO1 [34], highlighting the importance of RNA processing in CLL.

Patients with SF3B1-mutated CLL have a shorter time to treatment, a shorter time to disease progression
and lower overall survival rates [34,40]. These mutations were also found in higher rates in patients with chemo-refractory
CLL [69]. Other data indicate that the SF3B1 mutation may be a later event in CLL, as it was observed to be acquired in patients
with relapsed disease [74], or that it expands from a minor subclone to become the dominant subclone upon relapse
[75]. Along the same lines, it has been suggested that it is rarely seen in MBL [76], a clonal condition that is thought to precede CLL, although the sample size, particularly
of CLL samples with unmutated IGHV, may have been too small to adequately address this question. SF3B1 mutations therefore may have a role in clonal evolution in CLL, emerging later in
the disease course, and in relapsed or refractory disease.

Clonal evolution drives CLL progression

One of the main challenges for cancer therapeutics is the plasticity of cancer - its
ability to adapt both to host defenses and to treatment. A central component of this
plasticity is clonal evolution fueled by the coexistence of multiple subpopulations
within the tumor [77]. These concepts were first demonstrated in CLL using cytogenetic technologies [78] and more recently using SNP arrays, which have also shown that relapsed disease is
genetically altered compared with disease at diagnosis [23,79,80].

With the advent of NGS, clonal evolution has been characterized at unprecedented resolution
using whole-genome sequencing of small cohorts of patients with a variety of cancers
[81-84]. In CLL, whole-genome sequencing was performed to track clonal heterogeneity in three
CLL patients subjected to repeated cycles of therapy [85]. Notably, three very different temporal patterns of repopulation of the leukemic
cell mass emerged after therapy, varying from a stable equilibrium between five subpopulations
over the course of years in one patient, to marked shifts, in which one minor subclone
replaced the dominant clone entirely, in another. These findings suggest the existence
in CLL of an intricate 'ecology' in which a complex interplay is present between intrinsic
and extrinsic/environmental factors that control the balance between different subpopulations
within the entire CLL population [86].

Recently, we investigated clonal evolution in CLL by using whole-exome sequencing
[75]. The methodologies developed in this study enabled the analysis of a large cohort
of samples involving 149 patients, including 18 cases that were followed longitudinally.
By studying the allelic fraction of each mutation, the proportion of the subpopulation
that harbored it among the entire cancer-cell mass was inferred, and each mutation
event was classified as either clonal, meaning a mutation that affects all cancer
cells (and corresponding to a founder mutation or an earlier mutation that underwent
a complete selective sweep that eliminated all other cancer cells not bearing this
mutation), or subclonal, which affects a subpopulation of cancer cells (representing
events acquired later in the disease course).

This framework enabled the inference of the temporal order of genetic driver events
in CLL, with the identification of earlier (for example, MYD88 mutation) and later events (for example, TP53 mutation) in disease progression. We also tracked clonal evolution longitudinally
in 18 patients [75], observing that patients who received therapy had a higher rate of clonal evolution,
suggesting that perhaps chemotherapy itself can hasten the evolutionary process. Finally,
clonal heterogeneity was linked to adverse clinical outcome, adding a further dimension
to current efforts to link discrete somatic mutations to outcome. These findings suggest
that it is not only the presence or absence of a mutation that should be considered
in analyses of the impact of mutations on clinical outcome, but also the size of the
subpopulation a mutation affects. This finding has important clinical implications
that can be tested in prospective clinical trials.

Beyond somatic genetic alterations: epigenetic changes in CLL

Cancer has traditionally been viewed as a disease driven by the accumulation of genetic
mutations [87]. This paradigm has been increasingly modified as cumulative evidence has suggested
that the disruption of epigenetic regulatory mechanisms has a critical role in neoplastic
transformation [88,89]. In CLL, for example, epigenetic modifications have been implicated in the recurrent
microRNA deregulation observed in miR-15a/16 and the related miR-29b [90]. Histone deacetylases were shown to be overexpressed in CLL, and mediate the epigenetic
silencing of microRNAs through removal of the activating chromatin modification H3K4me2.

Perhaps the best-studied epigenetic modification in CLL has been direct DNA methylation,
which occurs at the cytosine residue of the CpG dinucleotide in mammalian genomes.
Patterns of DNA methylation can be inherited across generations of somatic cells as
they are stably maintained through somatic cell division. This type of epigenetic
alteration is at least as common as mutational events in the development of cancer
[91]. Published reports of epigenetic gene dysregulation in CLL include hypomethylation
of BCL2 [92] and TCL1 [93], as well as silencing of DAPK1 through promoter hypermethylation, which recapitulates a germline mutation found in
a kindred of familial CLL [94].

More recently, genome-wide platforms have been applied to the study of DNA methylation
in CLL. DNA methylation arrays detect representative methylation sites across the
entire genome and have been used to identify regions with differential methylation
in CLL samples with mutated or unmutated IGHV status [95]. Most of these differentially methylated regions have been reported to lie outside
CpG islands, to remain stable over time and to involve multiple genes important in
CLL biology, such as ZAP70, NOTCH1 and IBTK, as well as epigenetic regulators (such as DNMT3B) and NF-κB/tumor necrosis factor (TNF) pathway genes [95]. Similar investigations were performed comparing CLL samples with high and low CD38
expression, and found variable methylation in the DLEU7 gene [96]. Finally, pervasive methylation changes have been observed across numerous microRNA
sites in CLL samples compared with normal B cells, which were associated with large
changes in expression of these microRNAs [97].

Bisulfite conversion coupled with NGS has also been used to delineate DNA methylation
across the entire genome at base-pair resolution [98]. Using this method, methylation profiles have been shown to vary substantially between
CLL with mutated versus unmutated IGHV status and to mirror epigenetic differences seen between naive and memory B cells.
The methylation patterns observed in the study allowed the authors to identify, in
addition to the mutated and unmutated IGHV subsets, a third subset of CLL samples with distinct clinical behavior (an intermediate
prognosis group, with a better prognosis than patients with IGHV-unmutated CLL and a worse prognosis than those with IGHV mutations), and an intermediate level of IGHV somatic hypermutation. Another method using bisulfite conversion focuses on a representative
sample of CpG sites termed reduced representation bisulfite sequencing (RRBS). This
method has been found to be highly informative, and is less costly than whole-genome
bisulfite conversion [99]. The application of RRBS to CLL [100] has shown that differentially methylated regions are enriched for transcription factors,
including the homeobox family of proteins. Furthermore, DNA methylation serves to
enhance particular critical pathways in CLL, such as Wnt signaling, by the simultaneous
hypermethylation of pathway antagonists (for example, DKK) and hypomethylation of Wnt ligands and transcription factors (for example, TCF7), with the net result of decreased antagonist transcription and increased agonist
transcription, respectively. Collectively, these studies have shown that DNA methylation
probably plays a significant role in CLL biology.

Profiling the transcriptional landscape of CLL to understand the impact of genetic
and epigenetic alterations on the cellular network

The various genetic and epigenetic alterations described earlier can affect the cellular
network and lead to system-wide transcriptional changes. Studying the transcriptome
enables an understanding of how mutations alter cellular behavior, and this should
give a better idea of the ultimate phenotype. Expression arrays have been used to
study CLL for many years in an effort to define subtypes related to clinical outcomes
(reviewed extensively elsewhere [101-103]). These methodologies have also been used to classify different subtypes (for example,
IGHV-mutated versus IGHV-unmutated) as well as to try and identify the normal cellular counterpart of CLL
(that is, the closest normal B-cell phenotype that may serve as a cell of origin for
CLL) [104].

A systems-level examination of the transcriptional landscape of CLL has the potential
to reveal subsets of patients with disparate risks for CLL progression. By studying
individual pathway disruptions, these pathways were shown to converge as patients
progressed before treatment and to assume similar transcriptional profiles closer
to the point at which they required treatment [105]. Thus, the transcriptional profile of CLL can be reduced from a daunting number of
individual genes to a handful of meaningful pathway annotations with important biological
and clinical implications.

High-throughput RNA sequencing has enabled the harnessing of NGS technology for the
study of transcriptional profiles. A pilot study compared RNA-sequencing data from
a small number of samples with mutated versus unmutated IGHV, the most well-established prognostic factor in CLL [106]. In addition to identifying 156 differentially expressed genes, the study identified
a large number of differentially expressed non-coding RNAs as well as marked changes
in splice variants between the two prognostic groups. Thus, this methodology is capable
of providing a wealth of information in comparison with microarray-based gene expression
profiling, with the potential to demonstrate how genetic and epigenetic changes translate
at the cellular network level.

Conclusions and future directions

The intensive application of NGS to the study of CLL has yielded remarkable insights
over a short period of time, and it is likely that the exponential growth in our understanding
of this disease will continue in the coming years. The use of these novel technologies
has identified expected (for example, TP53 and ATM mutations) and unexpected CLL drivers (for example, SF3B1), and has opened new avenues of research, such as the study of splicing abnormalities
(Figure 2). NGS has also revealed the tremendous degree of genetic heterogeneity in CLL, both
among patients and within individual leukemias over time.

Delineating the inter-patient genetic heterogeneity of CLL has high translational
potential. First, novel genetic abnormalities such as NOTCH1, SF3B1 and BIRC3 mutations carry prognostic significance, and will probably be used in the future to
predict the highly variable clinical course of CLL, beyond the established predictive
factors such as IGHV mutation status and cytogenetic abnormalities [46]. Second, these lesions may also be informative regarding treatment stratification
- similar to the use of TP53 disruption today, which is known to be associated with chemo-refractory disease [2]. Finally, some of the genetic lesions identified by NGS represent attractive candidates
for targeted therapy. NOTCH1, for example, is already being targeted by some drugs
under development [107]. The promising results obtained with inhibitors of BCR signaling (that is, the BTK
inhibitor ibrutinib and the PI3K-δ inhibitor GS-1101 [20]) suggest that future research should also focus on how these drugs affect CLL cells
with different driver lesions.

The emerging understanding of intra-tumoral genetic heterogeneity in CLL may also
eventually have a clinical impact. Studying clonal evolution in relation to therapy
could help us to refine our understanding of resistance mechanisms and repopulation
kinetics. For example, studying the genomes of relapsed CLL compared with pre-treatment
CLL patients could be informative with respect to specific lesions or mutations that
are selected in vivo in the setting of therapeutic bottlenecks. Collecting multiple longitudinal samples
throughout the disease and treatment process could highlight the comparative kinetics
of different subpopulations, enhancing our understanding of the evolutionary process.
It will also enable us to gain an understanding of the impact of targeting early clonal
lesions compared with late aggressive subclonal drivers on therapeutic outcome. Finally,
the suggestion that therapy itself can accelerate clonal evolution could influence
the current paradigm of gene-specific discovery, by challenging us to conceive therapeutic
strategies to directly address and anticipate clonal evolution, which has been demonstrated
to affect clinical outcome [75].

Future directions for NGS-based studies will probably also include studying the entire
continuum of CLL, from MBL to Richter's transformation [39,61]. Studying MBL may be particularly informative regarding the nascent stages of CLL
and the critical genetic steps required for transformation to CLL. In addition, focusing
on distinct groups of patients, such as those with poor clinical outcome (rapid progression
and poor treatment response), would assist in defining the genetic elements that contribute
to disease heterogeneity. Some of these have already been identified, such as the
long-established role of mutations in TP53 and ATM, as well as the more recent identification of the poor prognostic significance of
SF3B1 and BIRC3 mutations. However, it is likely that other somatic events or specific mutation combinations
can affect clinical phenotype, and a comprehensive mapping of these elements will
improve prognostication. Pathway analysis, as portrayed in Figure 2, may also unravel how disruption of different parts of the cellular machinery can
translate into altered clinical outcome.

Moreover, these technologies are likely to be applied to studying inherited predisposition
for CLL [108,109], as this disease has a high incidence of familial cases. This area of investigation
might provide important clues to the interaction between existing germline mutations
and acquired somatic mutagenesis. Finally, probing the epigenetic profile of CLL is
currently in its nascent stages and will likely lead to a better understanding of
genome-wide levels of epigenetic modifications, as well as how different populations
within the cancer-cell mass differ in their epigenetic profiles and how this affects
functional diversity. For example, these epigenetic differences might lead to variations
in proliferative capacity, pluripotent potential [110] or ability to resist therapy [111].

Ultimately, a comprehensive understanding of the genetic basis of CLL will assist
in stratifying patients and matching treatments with genetic lesions, with a goal
of developing targeted therapies to improve CLL management. The wealth of emerging
genetic data has great potential to provide new paths for improved treatment options
for this disease, and will require focused translational efforts to enable the application
of this knowledge into clinical care.

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

DAL acknowledges support from the American Society of Hematology (Research Award for
Fellows-in-Training) and the American Cancer Society. CJW acknowledges support from
the Blavatnik Family Foundation, the Lymphoma Research Foundation, NHLBI (1RO1HL103532-01;
1RO1HL116452-01) and NCI (1R01CA155010-01A1), and is a recipient of a Leukemia Lymphoma
Translational Research Program Award and an AACR SU2C Innovative Research Grant.