Transcript

1.
The FASEB Journal express article 10.1096/fj.03-0101fje. Published online January 8, 2004.
Genes differentially expressed in thyroid carcinoma
identified by comparison of SAGE expression profiles
Erwin Pauws,* Geertruda J. M. Veenboer,* Jan W. A. Smit,‡ Jan J. M. de Vijlder,*
Hans Morreau,† and Carrie Ris-Stalpers*
*Laboratory of Pediatric Endocrinology, Academic Medical Center, Amsterdam, The
Netherlands; †Department of Pathology; and ‡Department of Endocrinology, Leiden University
Medical Center, Leiden, The Netherlands
Corresponding author: C. Ris-Stalpers, Laboratory of Pediatric Endocrinology, AMC, Room G2-
136, PO Box 22700, 1100 DE Amsterdam, The Netherlands. E-mail: c.ris@amc.uva.nl
ABSTRACT
To identify transcripts that distinguish malignant from benign thyroid disease serial analysis of
gene expression (SAGE) profiles of papillary thyroid carcinoma and of normal thyroid are
compared. Of the 21,000 tags analyzed, 204 tags are differentially expressed with statistical
significance in the tumor. Thyroid tumor specificity of these transcripts is determined in silico
using the tissue preferential expression (TPE) algorithm. TPE values demonstrate that 42 tags of
the 204 are thyroid tumor specific. BC013035, a cDNA encoding a novel protein, is up-regulated
from 0 to 24 tags in the thyroid tumor SAGE library. In a tissue panel of 30 thyroid tumors and
12 controls, it has an expression pattern similar to thyroid peroxidase, indicating possible
involvement of BC013035 in thyroid differentiation. A tag coding for extracellular matrix
protein 1 (ECM1) is absent in the normal thyroid SAGE library and present 55 times in the
tumor. ECM1, a protein recently associated with angiogenesis and expressed in metastatic breast
carcinoma, is up-regulated in 50% of all thyroid carcinoma and absent in normal controls and
follicular adenoma. In conclusion, SAGE analysis and subsequent determination of TPE values
facilitates the rapid distinction of genes specifically expressed in cancer tissues.
Key words: tissue preferential expression ● serial analysis of gene expression ● extracellular
matrix protein 1
D espite the low incidence of thyroid carcinoma (0.5-10 cases per 100,000), thyroid
nodules are relatively frequent in the population (1, 2). In the case of a cold nodule, the
diagnostic aim is to exclude a thyroid carcinoma. Lack of powerful diagnostic markers
that distinguish between benign and malignant thyroid diseases causes many patients to undergo
surgery. By improving diagnosis, the number of thyroid surgeries that are performed for thyroid
nodules with a suspicion for thyroid carcinoma can be markedly reduced. Although the general
prognosis of thyroid carcinoma is favorable, subgroups are at risk for recurrent disease,
metastasis, or death. Because the identification of these groups is difficult, almost all patient will
undergo intensive initial therapy, which may not be necessary in all patients when suitable risk
factors are available (3). The most common thyroid cancer subtype is papillary thyroid
carcinoma (PTC), which accounts for ~60% of cases. Follicular (FTC) and anaplastic
(undifferentiated) carcinoma (ATC) are less frequent (4). Thyroid cancers metastasize to lymph

2.
nodes (PTC) and lungs and bone (FTC, ATC; ref 1). The fact that thyroid epithelial cells can
dedifferentiate to malignant tumors with entirely different histological and clinical behavior
makes the molecular pathogenesis of thyroid carcinoma an event with relevance for general
processes in dedifferentiation. The identification of genetic variations in thyroid tumors has
increased the understanding of the molecular basis of thyroid carcinoma (5). Differentially
expressed genes can be used as target for molecular-based diagnosis and therapy. In general,
thyroid-specific gene expression is down-regulated in thyroid carcinoma and an anaplastic
carcinoma has lost all thyroid-specific expression (6, 7). Loss of heterozygosity, gene
rearrangements, and point mutations are linked to the development of thyroid cancer.
Comparative genomic hybridization (CGH) demonstrated specific loss of chromosome 8 in ATC
(8). Other studies identified chromosomal regions 7q and 10q as specific cytogenetic events in
FTC (9, 10). Recently, FTC is associated with a PPARγ and PAX8 rearrangement (11). PTC
tumors are often associated with genomic rearrangements of the ret proto-oncogene (12),
especially in case of external radiation as in the Chernobyl incident (13, 14). Activating point
mutations in the TSH-receptor gene and the ras and gsp oncogenes are implicated in several
studies (15, 16). Mutations in mitochondrial DNA are linked to thyroid tumor progression (17).
Several differentially expressed genes are described in thyroid carcinoma (for review see ref 18).
Galectin-3 was reported as a candidate marker suitable to distinguish benign from malignant
thyroid neoplasms (19, 20). However, recently other reports have raised important questions
about the accurateness of galectin-3 as a diagnostic marker, especially on the RNA level (21, 22).
To further elucidate the molecular pathogenesis behind thyroid cancer and to identify novel
markers for differentiated thyroid carcinoma, we used serial analysis of gene expression (SAGE),
a high-throughput technique suitable for this purpose. SAGE is a sequence-based approach to
identify a cell-specific gene expression profile (23). Analysis and comparison of SAGE
expression profiles can identify novel genes involved in the molecular pathology of disease (24).
In molecular oncology, SAGE studies have advanced the understanding of tumor-specific
changes in gene expression (25, 26). In this study, we constructed and analyzed a SAGE library
from a follicular variant of PTC with widespread but indolent metastatic behavior (27). PTC is
the most common thyroid carcinoma, and by analyzing this particular tumor sample we expect to
identify genetic factors associated with aggressive and/or metastatic behavior. In a previous
study, our laboratory analyzed the SAGE expression profile of a normal thyroid to identify novel
thyroid-specific genes (28). Comparison of both expression profiles identifies differentially
expressed genes in thyroid carcinoma. Subsequent in silico analysis using the TPE algorithm
(29), which was refined for this specific purpose, allows the selection of candidate tumor-
specific transcripts. Finally, expression of two candidate genes was studied by semiquantitative
RT-PCR (sqRT-PCR) on a panel of normal and tumor thyroid tissues.
METHODS
Tumor tissue and RNA isolations
Frozen tissue was collected of 30 patients who underwent thyroidectomy for suspicion of thyroid
carcinoma in the Leiden University Medical Center. Pathological classification of the samples
was made as follows: 5 follicular adenomas (FA); 20 papillary carcinomas (PTC) from which 6
are follicular variants; and 5 follicular carcinoma (FTC). From eight PTC samples, tumor tissues
and adjacent normal thyroid tissue were collected after microdissection. Data on vascular
invasion, lymph node metastasis, and/or distant metastasis were recorded. For SAGE analysis

3.
one tumor (Thy_T), a follicular variant of PTC was selected because of its aggressive metastatic
behavior and abundant tissue availability. Cytogenetic analysis of this tumor showed a
t(3;5)(q12;p15) chromosomal translocation and LOH of chromosome 22 (27). Normal thyroid
tissue was obtained from four individuals without thyroid pathology after resection at routine
autopsy. After homogenization of tissue samples, total RNA was extracted using TRIzol (Gibco-
BRL).
SAGE library construction and analysis
SAGE libraries were made using SAGE protocol 2.0 and as described previously (23, 28). SAGE
clones were sequenced with the Dyenamic Direct cycle sequencing kit (Perkin Elmer) using the
T7 priming site. Samples were run on an ABI377XL Automatic Sequencer (Perkin Elmer) and
analyzed with Sequence Analysis 3.0 software. Tag extraction was performed using the analysis
program USAGE 2.01 (30). Comparison of SAGE libraries was done using USAGE and its
intrinsic statistical package calculating P values of a comparative Z-test (31). Tag identification
and expression were done using NCBI/CGAP’s SAGEmap program (32).
Tissue preferential expression analysis
Tissue preferential expression (TPE) analysis was performed as described previously (29) with
some modifications. The following algorithm was used:
TPEN = √[(Ratio(tagN))2+(%Libraries)2];
for every tag with expression level NREF in the reference library the log-ratio with respect to the
expression level in all other libraries Na-x is calculated.
Ratio(tagN)=∑a-x{log[(0.001+NREF(tagN))/(0.001+Na(tagN))]}.
The sum of all ratios is Ratio tagN. The percentage of libraries where a given tag is expressed is
taken as the second component in the calculation of the TPE value. The resulting TPE values are
normalized and range between 0-100. Generally expressed transcripts typically have TPE values
<25. Tissue-specific transcripts have TPE values ranging from 75-100. The expression of a
selection of SAGE tags was scored in 49 human SAGE libraries available at NCBI/CGAP’s
SAGE website at URL:(www.ncbi.nlm.nih.gov/SAGE). Libraries constructed from in vitro cell
lines were omitted. The expression in the reference library Thy_T (PTC) was compared with
respectively a group of 18 libraries from different normal tissues [Thyroid_N/Chen Normal
Pr/PR317 normal prostate/normal prostate/mammary epithelium/Duke 40N/Duke 48N/Br
N/normal lung/Duke leukocyte/Duke Kidney/NC1/NC2/Duke thalamus/Duke BB542 normal
cerebellum/BB542 white matter/normal pool(6th)/normal cerebellum] and a group of 31 libraries
from different tumor tissues (Thyroid_T/gastric cancer xenograft X101/gastric cancer-
G234/Chen Tumor Pr/PR317 prostate tumor/PrCA-1/LN-1/Panc 91-16113/Panc 96-
6252/OC14/OVT-6/OVT-7/OVT-8/95-347/95-259/95-260/95-348/DCIS/Duke 96-349/DCIS
2/Tu102/Tu98/Duke 1273/Duke H1020/Duke GBM H1110/pooled GBM/Ped GBM1062/Duke
H1126/H1126/Duke 757/Duke H1043).

4.
sqRT-PCR
First-strand RACE cDNA was synthesized from thyroid RNA samples using the cDNA synthesis
system (Gibco-BRL). Primers were designed on GenBank sequences using the primer design
program Primer3 at URL:(www.genome.wi.mit.edu/cgi-bin/primer/primer3.cgi). With the use of
the following sequences, oligonucleotides were synthesized by Isogen: extracellular matrix
protein 1 (ECM1) accession no. NM_004425, For (forward primer) 823-842, Rev (reverse
primer) 1094-1113, fragment size 547 bp; thyroid peroxidase (TPO) accession no. M17755, For
2161-2180, Rev 2589-2608, fragment size 450 bp; hypothetical protein BC013035 accession no.
NM_138436, For CGACTATCAGCAGCCACAAA, Rev TGCAAATGGCATAAACTCCA,
fragment size 386 bp; mitochondrial ATPase 6 (ATP6) accession no. X62996, For
CAGTGATTATAGGCTTTCGCTCTAA, Rev CAGGGCTATTGGTTGAATGAGTA, fragment
size 190 bp (33). PCR amplification was performed using standard conditions, 2 mM MgCl2 and
30’’ 94°C, 1’ 55°C, 1’ 72°C for an appropriate number of cycles. The amount of cycles used for
ECM1 and TPO was 26, 28 for BC013035, and 25 for the control transcript ATP6. In all cases,
this was during the exponential phase of the reaction, well before the plateau phase was reached.
Ten microliters of every 25 µl PCR reaction were run on a 2% agarose gel and visualized using
ethidium bromide. Imaging of the gel was performed using the Eagle Eye II System (Stratagene).
Intensity of bands was measured using OneD-scan software from Scanalytics. Density values of
every band are corrected for background and normalized for PCR fragment size. Ratios of
respectively ECM1, TPO, and BC013035 over control transcript ATP6 are calculated.
RESULTS
Generation and comparison of SAGE libraries
A SAGE library was generated from the selected PTC tissue (Thy_T). Clones were sequenced
resulting in 10,495 tags representing 5,913 unique transcripts. Together with the previously
constructed normal thyroid library (Thy_N; ref 28), a total of 21,489 tags representing 11,350
unique transcripts are available for analysis. To identify transcripts up- or down-regulated in the
thyroid tumor, both expression profiles are compared using USAGE software (30). SAGE
libraries are normalized to a total tag number of 50,000. With the use of a cut-off significance P
value of 0.05, 270 tags are differentially expressed, corresponding to 2.4% of the total number of
transcripts in the two libraries. In this analysis, the significance P value of 0.05 corresponds to a
differential expression of approximately fivefold.
Identification of differentially expressed SAGE tags
TAG-to-GENE identification is performed with USAGE software. Tags that cannot be annotated
due to their specific sequence (e.g., AAAAAAAAAA or those containing ALU repeats) and tags
corresponding to mitochondrial and ribosomal transcripts are excluded from subsequent analysis.
The remaining cohort consists of 204 transcripts, of which 50 are up-regulated and 154 are
down-regulated in the tumor. Of these tags, 60 cannot be linked to a known human transcript and
are denoted NoMatch tags from which 27 tags are up-regulated and 33 tags are down-regulated
in Thy_T. NoMatch tags do in most cases link to UniGene clusters consisting of ESTs or full-
length cDNAs with unknown function. Of the remaining 144 tags with a positive UniGene
match, 23 tags are up-regulated and 121 tags are down-regulated in Thy_T. Several thyroid-
specific transcripts [e.g., thyroglobulin (TG) and TPO] are down-regulated. Genes previously

5.
associated with cancer can be identified (e.g., VEGF, CD9, xpC, ECM1) in the group of up-
regulated transcripts. Other up-regulated transcripts do not play a role in (thyroid) tumor biology.
TPE analysis
To investigate the expression profile of 204 differentially expressed transcripts in normal and
tumor tissues other than thyroid, in silico analysis using TPE was performed. For each of the 204
tags, two TPE values are calculated to score the specificity toward two groups of tissues. The
first group consists of 18 SAGE libraries from different normal tissues and analysis results in a
TPE value termed TPE_N. The second group consists of 31 SAGE libraries from different tumor
tissues. TPE analysis calculates a value TPE_T. Both values are plotted in a graph (Fig. 1). The
bottom-left area of the plot represents differentially expressed tags in our SAGE analysis but
with a low specificity for thyroid carcinoma when compared with either normal or tumor tissues.
These tags typically correspond to widely expressed housekeeping genes. In the top-right area of
the plot, tags with a differential expression specific for the Thy_T library when compared with
either tumor or normal tissues are located. These tags are potential thyroid tumor markers
because their corresponding genes are specifically over- or underexpressed in thyroid carcinoma
when compared with carcinoma from different origin. Tags representing TG and TPO are located
in this area. Table 1 lists 42 putative thyroid tumor markers from tags with at least one TPE
value ≥75. Positive linkage with a UniGene cluster can be made for 30 tags (groups Match and
NoMatch(+hit)) of which ~50% correspond to a UniGene cluster with a defined protein function.
Twelve tags cannot be positively linked to any UniGene cluster (group NoMatch(-hit)).
Validation by RT-PCR
To validate the SAGE results in sqRT-PCR, a random selection was made of six transcripts. For
this the original tissues from which the SAGE libraries were constructed were used (Fig. 2). The
mitochondrial house-keeping gene ATPase 6 (33), which shows similar and relatively high
expression levels in both SAGE libraries (131 tags in Thy_N and 157 tags in Thy_T), is used as
a control transcript. The intensity of the amplified products corroborates the differential tag
counts from the SAGE libraries, taking into account that a tag count of zero in a library of 10,000
tags does not necessarily mean that there is zero expression. The higher level of sensitivity of the
RT-PCR experiment shows this as a weak band in lanes where the observed tag count was zero.
sqRT-PCR on thyroid tumor cDNA panel
The expression of three genes was studied in a panel of benign and malignant thyroid neoplasms.
ECM1 for its association with metastasized breast carcinoma, TPO for its previously established
relation with (de)differentiation of thyroid cells, and a novel transcript coding for a hypothetical
protein (BC013035). Figure 3 shows the sqRT-PCR of ECM1, TPO, and BC013035 in a panel of
30 thyroid tumors and 12 normal thyroid controls. Figure 3A shows an agarose gel with RT-PCR
products of ECM1, TPO, BC013035, and ATP6. PCR amplification products of ATP6 reflect the
amount of cDNA input. Normalized expression ratios are depicted graphically in Fig. 3B.
Normal thyroid tissue shows high expression of TPO, no ECM1 expression, and intermediate
BC013035 expression. Normal thyroid tissue surrounding PTC tumors (samples A-H) shows
slightly decreased expression of TPO, no ECM1 expression, and increased BC013035
expression. There is no ECM1 expression in follicular adenoma while the expression levels of
TPO and BC013035 are decreased compared with normal. Papillary and follicular carcinoma

6.
show decreased or absent expression levels of TPO. BC013035 is expressed in 10 out of 25
carcinoma with levels comparable to normal thyroid control. ECM1 is expressed in 11 out of 25
carcinoma showing no preference for PTC, fPTC, or FTC. There is no correlation between
expression levels of ECM1, TPO, or BC013035, nor with the invasive characteristics of thyroid
carcinoma, such as vasoinvasiveness, lymph node metastasis, and/or distant metastasis.
DISCUSSION
The comparison of SAGE-generated gene expression profiles of a normal thyroid and a follicular
variant of PTC identified differentially expressed transcripts. After exclusion of tags
corresponding to repetitive sequences, mitochondrial or ribosomal transcripts, 204 tags are
differentially expressed, from a total of 11,350 different transcripts present in both libraries
(1.8%); 154 of these are statistically significantly down-regulated and 50 are up-regulated
(P<0.05) in the thyroid tumor. Similar numbers were reported in a SAGE study of lung cancer
(26).
A general problem of high throughput techniques is the extensive output of data. To expedite the
identification of SAGE tags of interest, we developed the in silico TPE (29) analysis that can be
easily tailored to fit a specific research question, in casu the identification of transcripts
differentially expressed in PTC. This “virtual Northern” uses an algorithm to calculate TPE
values for every tag, based on the number of SAGE libraries the tag is expressed in and the
relative level of expression in other libraries. TPE resembles techniques like Digital Differential
Display (34) but since the algorithm assigns a standardized value to each tag, it is easier to sort
tags based on their specificity making it possible to distinguish a subgroup of transcripts, tailored
to a specific query.
The TPE analysis applied in this study was graphically plotted to select thyroid tumor candidate
genes (Fig. 1). Contemplating functions of proteins corresponding to the TPE values shows that
most housekeeping genes widely expressed in all tissues have low TPE values (<25). Proteins
with a tissue-specific function from genes expressed in only one tissue have high TPE values
(>75). The best example in this study is that TG, a gene expressed solely in thyroid follicular
cells, has TPE values around 90 and is located in the most upper-right quadrant of Fig. 1.
Moreover, all three tags corresponding to TG transcripts because of alternative splicing in the
3′UTR (35) show very high TPE values. TPO, another thyroid-specific transcript, down-
regulated in thyroid tumors, also has TPE values around 90. The result of the TPE analysis
shows that the largest part of tags differentially expressed in PTC is expressed in a wide range of
tissues, normal as well as neoplastic. From 204 differentially expressed annotated tags, 42 tags
have TPE values over 75 and are considered candidate thyroid tumor markers.
The most abundant thyroid-specific transcripts, TG and TPO, are significantly down-regulated in
the tumor SAGE library. The expression of the low abundant TSH-R and NIS is lost in the tumor
library; due to the relatively low levels of expression of these transcripts, statistical significance
is not reached. It is generally accepted that the down-regulation of thyroid-specific transcripts in
thyroid carcinoma is indicative for the dedifferentiation process during the progression of the
tumor (7). This phenomenon was again shown in a study where PTC expression profiling using
microarrays was performed (36). This study also identifies two genes (CITED1 and SFTPB)
specifically overexpressed in PTC. Neither of these two genes were present in the SAGE profiles
in this study.

7.
Among the 50 transcripts that are up-regulated in PTC, transcripts coding for proteins with many
different functions can be distinguished. VEGF is the only transcript in this group previously
implicated in thyroid tumor progression (37, 38). The mRNA transcript from this protein is
overexpressed 11-fold in our tumor SAGE library. Recent reports suggested a bad prognosis for
VEGF-positive thyroid tumors (38, 39). VEGF, however, is not a thyroid tumor specific marker,
as it is seen in many tumors of different origin. This is supported by the TPE values of the tag
representing VEGF (TPE_N: 72 and TPE_T: 56), indicating that it is expressed in some normal
tissues and in several tumor tissues.
In the only other thyroid SAGE study by Takano et al. (40), thyroglobulin is down-regulated in
differentiated tumors and absent in anaplastic carcinoma. In our follicular variant of papillary
carcinoma, thyroglobulin is also absent. This result may indicate the high level of
dedifferentiation of the tumor, which was selected for its aggressive behavior. Osteonectin that
was identified as a marker for anaplastic carcinoma by Takano is not present in our SAGE
library.
sqRT-PCR on a panel of 30 thyroid tumors and 12 normal controls shows down-regulation of
TPO in all tumors indicating dedifferentiation of these tissues. Additionally, normal control
samples taken from healthy subjects show higher TPO expression then normal tumor-
surrounding tissues. The expression pattern of hypothetical protein BC013035 shows a similar
pattern to TPO. It is expressed in all normal thyroid samples, and its expression is down-
regulated in most tumors, irrespective of the subtype. On the basis of the expression pattern of
BC013035 in thyroid tumors, it does not seem a good tumor marker. Since it is expressed in
normal thyroid tissue, it is more likely to play a role in general thyroid physiology.
ECM1 shows overexpression in 50% of PTC and 40% of FTC tumor samples. The lack of
ECM1 in normal controls and follicular adenomas and its inverse correlation with the expression
levels of TPO indicate that ECM1 expression correlates with tumor progression. It is not possible
to distinguish between PTC and FTC on the basis of ECM1 expression, but it is possible to
distinguish follicular adenoma from thyroid carcinoma.
The 1.9 kb ECM1 mRNA encodes a protein of 85 kDa that was initially isolated from an
osteogenic stromal cell line (41, 42). Although ECM1 is implicated in differentiation of
keratinocytes (43) and is expressed in surrounding connective tissues of developing bones (44),
the function of the protein is still unknown. Recently, it has been associated with angiogenesis
and it is expressed in breast carcinoma cells (45). In the TPE analysis, ECM1 expression was not
observed in three available breast tumor SAGE libraries. The fact that in the former study
expression of ECM1 was seen preferentially in breast tumors of the more malignant type could
explain this difference. The link that was suggested between ECM1 expression and tumor
progression through angiogenesis is in our case supported by the fact that the transcript for
VEGF is up-regulated in our papillary thyroid tumor SAGE library to the same extent as ECM1.
VEGF is a recognized angiogenic factor already described as a marker with prognostic value in
thyroid carcinoma (37–39), although there are contradictory reports (46).
In conclusion, from a large-scale in silico analysis of gene expression profiles of normal thyroid
and follicular variant of PTC, a cohort of 42 putative thyroid tumor markers is defined. Many
novel transcripts are among them. ECM1 is identified as a gene up-regulated in a significant