Copyright Romanish et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

The human neuronal apoptosis inhibitory protein (NAIP) gene is no longer principally considered a member of the Inhibitor of Apoptosis Protein (IAP) family, as its domain structure and functions in innate immunity also warrant inclusion in the Nod-Like Receptor (NLR) superfamily. NAIP is located in a region of copy number variation, with one full length and four partly deleted copies in the reference human genome. We demonstrate that several of the NAIP paralogues are expressed, and that novel transcripts arise from both internal and upstream transcription start sites. Remarkably, two internal start sites initiate within Alu short interspersed element (SINE) retrotransposons, and a third novel transcription start site exists within the final intron of the GUSBP1 gene, upstream of only two NAIP copies. One Alu functions alone as a promoter in transient assays, while the other likely combines with upstream L1 sequences to form a composite promoter. The novel transcripts encode shortened open reading frames and we show that corresponding proteins are translated in a number of cell lines and primary tissues, in some cases above the level of full length NAIP. Interestingly, some NAIP isoforms lack their caspase-sequestering motifs, suggesting that they have novel functions. Moreover, given that human and mouse NAIP have previously been shown to employ endogenous retroviral long terminal repeats as promoters, exaptation of Alu repeats as additional promoters provides a fascinating illustration of regulatory innovations adopted by a single gene.

Introduction

Transposable elements (TEs) are ubiquitous components of most sequenced genomes, but their function, if any, is poorly understood. Comprising ~50% of the human genome, the majority of TEs belong to the short interspersed element (SINE) (>10%), long interspersed element (LINE) (>20%), and endogenous retroviral/long terminal repeat (LTR) (~10%) families [1]. The SINEs encode no open reading frame (ORF) and have utilized LINE-encoded proteins [2] to amplify to >106 copies in the human and mouse genomes [1], [3]. On the other hand, only a limited number of LINEs and LTR elements are full-length; many of which are rendered non-functional due to point mutations and deletions [4]. Therefore, the majority of TEs no longer pose a significant burden as insertional mutagens, although many retain the regulatory signals necessary for transcription [5], [6].

The LTRs and LINEs naturally harbour RNA polymerase II (pol II) signals and numerous examples of promoter exaptation by host genes exist [5], [7], [8]. On the other hand, SINEs replicate via pol III [9], and thus are not expected to impose direct regulatory effects on protein-coding genes. Indeed, SINEs are over-represented within gene-rich regions, while the LTRs and LINEs are under-represented [6]. Recent scrutiny of the primate-specific Alu SINEs has provided various illuminating findings. They can be incorporated into mRNA as cassette exons [10], [11], and are often found in UTRs [8], [9], [12]. Furthermore, consensus binding motifs for many pol II transcription factors have recently been identified within Alus [13], [14], but their role as promoters and enhancers has not been extensively researched.

We have previously shown that the neuronal apoptosis inhibitory protein (NAIP) orthologues in human (NM 022892.1) and mouse (NM 008670.2; NM 021545.1; NM 010870.2; NM 010872.2) provide a remarkable example of LTR promoter exaptation – unrelated LTRs were independently acquired as gene promoters [15]. NAIP is a member of the inhibitor of apoptosis protein (IAP) family, and was cloned as a candidate gene for the neurodegenerative disorder Spinal Muscular Atrophy (SMA) [16]. Consistent with its role as a modifier of SMA severity, NAIP has been shown to inhibit programmed cell death by binding activated caspases [17], [18], [19]. Moreover, the IAPs have emerged as therapeutic and diagnostic targets for various cancers [20], [21], [22]. Furthermore, the effect of NAIP expression in other neurodegenerative diseases, such as Alzheimer's disease, Down syndrome, multiple sclerosis, and Parkinson's disease, has also been investigated [23], [24]. Recently, a potential role in innate immunity surfaced through the discovery that polymorphism of a particular Naip copy in mouse strains determined permissiveness of Legionella pneumophila replication in host macrophages [25]. Paradoxically, Naip-mediated L. pneumophila restriction is caspase 1-dependent and signaling through this pathway results in the rapid death of infected cells [26], [27], [28]; a role consistent with its inclusion in the Nod-Like Receptor (NLR) superfamily of cytosolic pattern recognition sensors [29].

Here the flexibility associated with NAIP regulation in human is further demonstrated, by showing that 5′ truncated transcripts arise from two unique Alu SINEs. The resulting ORF is translated in a number of cell lines and primary tissues, and yields a protein possessing only the signature NLR domains. Since Alus are over-represented in gene-rich regions and present transcription factor binding motifs, their role in establishing transcriptional networks is of great interest, as previously suggested [13], [30]. These findings indicate, for the first time, that Alu insertions can serve directly as gene promoters and derive novel transcripts and protein isoforms. The existence of NAIP protein isoforms, as described here, should therefore be considered in future experiments addressing its IAP and/or NLR functions.

Results

Human NAIP is a multicopy gene

Copy number variation (CNV) exists in the region of human chromosome 5q13.2 encoding NAIP and other genes [31], [32], [33], as it does among inbred mouse strains [25]. In the reference human genome at least five copies are annotated [34] (Figure 1a), and while only one of these is full length, NAIPfull, the others are assumed to be pseudogenes since two are 5′- and two are 3′-deleted, NAIP1 & 2 and ΨNAIP1 & 2, respectively (Figure 1a, b). Exon content of the NAIP paralogues was verified using dot plots (Figure S1). While assessing their transcription using a variety of RT-PCR primers sets, we found that 3′ transcript levels of NAIP are greater than 5′ transcript levels in most tissues. In general, NAIP 5′ and 3′ transcripts showed the smallest differences in the macrophage-rich lung, spleen (Figure 1c), and blood (Figure S2). Expression of NAIP in these tissues most likely results from macrophage infiltration [35], the cell type mediating NAIP-dependent L. pneumophila immunity. The largest difference is observed in testis where 3′ levels are >40-fold above 5′ levels. Interestingly, in liver 5′ levels of NAIP are the highest (Figure 1c), potentially arising from transcription of 3′ deleted isoforms, premature poly-adenylation, or CNV-associated anomaly within the tissue sample screened. The abundance of 3′ transcripts raises the possibility that the 5′ deleted copies, NAIP1 and NAIP2, are expressed (Figure 1c, Figure S2), or that internal promoters of NAIPfull produce transcripts lacking the 5′ end, or both.

Novel human NAIP transcription start sites

The observation that levels of 5′ vs. 3′ transcription are not uniform across various human tissues prompted an analysis to determine where NAIP transcription was initiating. Previously, we showed that an upstream ERV-P LTR is a promoter of NAIPfull specifically in testis, but that ubiquitous expression derives from within an exon in the 5′ UTR [15]. Moreover, a previously published transcription start site [36], overlaps a MER21C LTR slightly upstream of the ERV-P, but could not be confirmed by 5′ RACE. However, an RT-PCR approach using tiled primers, similar to that of Xu et al. [36], indicated that an adjacent AluSx SINE was also included in these transcripts (Figure S3). We are unable to conclude whether this SINE is in fact a site of NAIP transcription or an internal exon of an undescribed 5′ UTR.

Here we revised our previous 5′ RACE approach, which only assessed the transcription start sites (TSS) associated with expression of NAIPfull[15], and numerous novel TSS were discovered (Figure 2). Unexpectedly, we observed that two Alu SINEs localized 5′ of exon 10, an AluSg and AluJb, are sites of NAIP transcriptional initiation, hereon referred to as NAIPSg and NAIPJb (Figure 2a). These Alus are in the antisense orientation, full-length (~300 bp) and present in NAIP orthologues of New and Old World primates (data not shown). Since sequence identity hinders their unambiguous mapping, NAIPSg and NAIPJb 5′ RACE clones could arise from three of the five copies (NAIPfull, NAIP1, and NAIP2) in the reference human genome (Figure S4). Thus, either NAIP1 and/or NAIP2 are expressed from Alus, or these Alus may serve as promoters within NAIPfull, or both.

A number of NAIPSg clones were obtained that mapped to two distinct TSS localizing in the 3′ terminus of the Alu (Figure S4a). Interestingly, the AluSg A-rich tail is known to be hypermutable [37], [38], however, the corresponding region of this particular element is identical to its consensus sequence. The upstream ~9 kb (relative to NAIPSg polarity) is a patchwork of LINE fragments and Alus, and likely contributes additional regulatory signals. All NAIPSg clones splice into the adjacent exon 8 (Figure 2a, Figure S4a), utilizing a splice donor site frequently employed by exonized antisense Alus [10], [11]. Several NAIPJb clones were also obtained, these map to two particular regions localized near the AluJb 5′ terminus (Figure S4b). The regulatory signals comprising the NAIPJb core promoter, therefore, are expected to lie within the body of this Alu. The NAIPJb clones, however, do not splice into the downstream exon 10, rather transcription continues through the intervening ‘intron’. The validity of NAIPJb transcripts is verified by +/− RT controls (Figure S5). Interestingly, the splice donor sequence utilized by NAIPSg has undergone an AG→AT transversion mutation in NAIPJb (Figure S4b); its capacity for splicing has not been studied here. Additional TSS downstream of NAIPJb, in the intervening sequence adjacent exon 10, are also observed (Figure S4b).

Another site of transcription initiation was identified within the final intron of the GUSBP1 gene (Figure 2a). Although sequence identity hinders unambiguous mapping of this transcript, the novel first exon splices into exon 4 of the adjacent NAIP1 and/or NAIP2. Consequently, expression of at least one other NAIP copy, in addition to NAIPfull, is demonstrated since a TSS within the final intron of the GUSBP1 gene is only adjacent to NAIP1 and NAIP2.

Variable contribution of Alu-associated NAIP transcripts in different tissues

To address the contribution of Alu-derived NAIP transcripts to total NAIP expression, qRT-PCR was performed. Although their transcription is detected in most tissues screened by RT-PCR (Figure S5), this approach indicates NAIPJb is expressed at levels similar to or higher compared to NAIPfull in many of the tissues tested, and is therefore likely an important promoter (Figure 3). In contrast, NAIPSg does not contribute significantly to total NAIP expression in any tissue tested (Figure 3). Interestingly, scrutiny of 5′ RACE sequences revealed that NAIPSg undergoes RNA editing in its 5′ UTR (Figure S4a), a common observation among transcribed Alus [40], [41]. Comparison of edited vs. un-edited NAIPSg transcript levels indicated the former is >10-fold more abundant than the latter (data not shown).

Most NAIP transcription in colon, spleen, lung, and prostate could be accounted for by the combined activity of all queried promoters, but the contribution of individual paralogues could not be assessed due to their high sequence identity. However, in kidney and testis all isoforms are not detected and it is likely that unaccounted 3′ transcription either initiates downstream of AluJb, as indicated above (Figure S4b), or from the NAIPGUSBP1 TSS. Contribution of NAIPGUSBP1-derived transcripts could not be assessed due to the complexity of alternative splicing in this 5′ UTR (Figure S5). As discussed previously, the 5′ levels of NAIP in liver are expressed 4-fold over 3′ levels, suggesting that all transcription in this tissue derives from NAIPfull. Since two independent liver RNA samples were screened, this rules out the possibility of patient-specific CNV, unless both samples derive from the same patient. Perhaps transcription in liver produces isoforms that constitutively omit one or both exons to which our 3′ qRT-PCR primer sets are designed. Alternatively, NAIPfull transcripts in this tissue could be aberrantly poly-adenylated. Regardless, neither NAIPSg nor NAIPJb are highly expressed in liver.

Full-length Alu-derived transcripts are broadly expressed

The fact that the AluJb functions as a pol II promoter is an intriguing finding, with genome-wide ramifications in establishment of transcriptional networks, as previously suggested [13], [30]. We next examined the potential for transcription of a novel NAIP ORF as a result of Alu promoter activity. Indeed, if all downstream exons are included in at least some Alu-derived NAIP transcripts, a 2,643 nucleotide ORF is preserved (Figure S6). Therefore, we sought to determine whether Alu-initiated transcripts continue to the 3′ terminus, by RT-PCR. Southern blotting was required since, by necessity, primers hybridized to Alus – the most plentiful elements in primate genomes [1]. Across all tissues screened, except liver, products corresponding to the expected size (~3 kb) were resolved for NAIPJb (Figure 4). Among various minor forms, one notable variant of ~2 kb is expressed at the same frequency as full-length NAIPJb. This ~2 kb variant, among numerous others including full-length, is also observed for NAIPSg transcripts in several tissues (data not shown). Potentially the smaller isoform could result from alternative splicing common to both NAIPJb and NAIPSg transcripts, between the site of reverse primer binding and probe hybridization. Alternatively, a single NAIP transcript possessing a second exonized Alu downstream of some or all of the probe-binding region could also explain this observation. The prominent ~3 and ~2 kb bands do not result from the simultaneous amplification of NAIPJb and NAIPSg due to primer cross-reactivity, since the respective transcripts and their unique 5′ UTRs are roughly equal in size. Nonetheless, existence of full-length Alu-derived transcripts, a potential 2,643 nucleotide ORF, and numerous in-frame ATGs in accordance with derived consensus sequences [42], [43] (Figure S6) suggest a potential for the synthesis of NAIP protein isoforms.

Novel human NAIP protein isoforms

Using the annotated copies of NAIP in the sequenced human genome as a reference [34], we scanned all possible full-length transcripts that could arise from the novel TSS reported above for ORFs and domain composition. Many potential ORFs were identified for each queried transcript, but only the longest examples were considered. Interestingly, all accepted examples represented N-terminal truncations of NAIPfull, indicating the existence of numerous potentially functional in-frame translation initiation codons (Figure 5a, Figure S6). NAIPfull was previously shown to comprise 1403 amino acids and yield a ~160 kDa protein encoding three N-terminal anti-apoptotic Baculoviral IAP Repeat (BIR) domains, followed by a central nucleotide binding domain (NBD) and C-terminal leucine-rich repeats (LRR) [16]. NAIPSg- and NAIPJb-mediated transcription of NAIP2 is predicted to generate an ORF 881 amino acid long, and corresponds to a 110 kDa protein that excludes the BIRs (NAIPAlu). Due to the deletion of exons 12-14 in NAIP1 a C-terminal truncation of the LRRs is also predicted, in addition to a truncation of its N terminus (Figure 1b), and could produce a ~85 kDa NAIP protein isoform, but was not detected. Finally, transcription from the promoter within the final GUSBP1 intron can drive expression of both NAIP1 and NAIP2, and potentially gives rise to 100 kDa (NAIP1) and 130 kDa (NAIP2) proteins, respectively. Both putative protein isoforms, NAIP1 and NAIP2, possess one N-terminal BIR domain, followed by the central NBD, but only NAIP2 harbours C-terminal LRRs. Indeed, western blots on human PC3, HeLa, and NTera2D1 cell lysates indicate the presence of multiple bands corresponding to the above computer predictions (Figure 5b). To more accurately assess the potential for translation of the Alu-derived NAIP2 ORF we generated a NAIP:hemagglutinin fusion protein (HANAIPAlu) and over-expressed it in the cell lines indicated above. The recombinant protein HANAIPAlu is translated and migrates at 110 kDa with the putative endogenous isoform (NAIPAlu) in untransfected PC3 and HeLa cells (Figure 5b). It is clear the NAIP protein isoforms are differentially expressed in the queried cell lines, but all three cell lines endogenously produce the ~160 kDa NAIPfull and ~110 kDa NAIPAlu proteins, albeit to a different degree. In the PC3 and HeLa cell lines, where HANAIPAlu was overexpressed, an increase in band intensity is seen compared to NAIPAlu in untransfected cells. Overall, expression of the putative NAIPAlu protein is low relative to NAIPfull in all cell lines, however, the difference is not as exaggerated in NTera2D1 cells compared to PC3 or HeLa. Lastly, it appears that neither NTera2D1 nor HeLa cells express the putative ~130 kDa NAIP2 protein isoform.

NAIP protein isoforms are broadly expressed in human tissues

The observation that NAIP proteins equivalent in size to all of the computer-predicted isoforms are expressed in the cell lines screened, prompted a similar investigation of primary human tissues (Figure 6). A variety of NAIP proteins were detected in most of the tissues examined, although NAIPfull is not broadly expressed. In fact, NAIPfull was only detected in heart, skeletal muscle, and at very low levels in testis. Similarly, the ~110 kDa protein, which is expected to represent the Alu-derived NAIP ORF, is also only detected in heart and skeletal muscle. Potential NAIP2 proteins at ~130 kDa are observed almost uniformly across the tissues tested, and could correspond to NAIPGUSBP1-initiated transcripts. The subtle variation of the putative NAIP2 proteins, such as in spleen and heart, could result either from alternative start codon selection (Figure S6) or alternative splicing of NAIP2 terminal exons. Importantly, all of the tissues screened here, other than testis, derive from one individual with unknown NAIP copy number and mRNA expression levels. Nonetheless, we demonstrate the expression of various human NAIP protein isoforms that correspond with calculated molecular weights of the ORFs generated by alternative promoter usage.

Discussion

Transposable elements were initially discovered as important factors in the regulation of gene expression in maize, and termed controlling units [44]. This view of TE usefulness was contrasted by the ‘junk DNA’ hypothesis [45]. In recent times their practicality has garnered increased attention, particularly as mobile regulatory modules [5], [9], [13], [30]. Strikingly, TEs are associated with many evolutionarily constrained regions in mammalian genomes [46], and many conserved non-coding elements are reported to function as transcriptional enhancers [47]. In general, it is difficult to ascertain the extent to which TEs donate their embedded regulatory signals to cellular genes, particularly because they can impose their effects over great distances. However, bioinformatics analyses of human and mouse genomes indicate a substantial impact of TEs on cellular gene regulation; as many as 25% of genes possess TEs in their UTRs [8], [48]. Therefore, their influence on increasing the diversity of mammalian transcriptomes is likely underappreciated.

The LTRs and LINEs, due to the natural presence of RNA pol II signals, are likely candidates to fulfill a regulatory role for cellular genes; dozens of known cases confirm their utility as regulatory modules [5], [7], [8]. In contrast, the pol III-dependent SINEs are concentrated in gene dense regions [1], [6], but have largely been neglected as modulators of cellular gene expression. Recent bioinformatics analyses, however, have revealed the presence of numerous RNA pol II transcription factor binding sites and hormone response elements within SINEs [13], [14], substantiating an earlier report [49]. Notably, the primate-specific Alus – divided into the old AluJ, intermediate AluS, and young AluY subfamilies – present consensus transcription factor binding sites distributed in an age-dependent manner [13]. Interestingly, among all gene-associated Alus on chromosome 21 and 22, older elements tend to harbour estrogen response elements and AP-1 docking sites, while younger and/or polymorphic Alus are enriched for other features, including retinoic acid response elements. In addition, important roles in mRNA poly-adenylation have also been revealed for Alus and other TEs in a variety of organisms [50], [51]. Since Alus number >106 copies in the human genome, are enriched in gene-dense regions, and contain potential pol II transcriptional regulatory motifs, they could be considered the most important transcriptional regulators.

For the first time it is shown here that an Alu can function as a direct promoter for a human gene. More commonly, they and other SINEs are incorporated into mRNA UTRs and coding regions as cassette exons [5], [8], [9], [52], facilitated by the presence of numerous splice donor and acceptor sites in the sense and antisense orientations [10]. Examples of SINE exaptation as promoters, however, are limited and represented by a sense B1 [53] and an antisense B2 [54] element in mouse. In human, an isoform of the p75TNFR gene initiates transcription from an antisense MIR SINE, with the adjacent AluJo providing an alternative translation start site [55]. Furthermore, a bioinformatics analysis reports the existence of several unvalidated antisense Alu-associated TSS [8]. Here, broad transcription of NAIP isoforms from exapted antisense AluJb and AluSg elements is demonstrated in a number of tissues, but it is unknown whether these sequences would also be functional in the sense orientation. The Sg and Jb exaptations associated with NAIP transcription belong to older families that exhibit 10% and 15% divergence from their consensus sequences, respectively. Remarkably, NAIPJb-associated transcripts are more highly expressed than full-length isoforms in many tissues, but NAIPSg levels are at the limit of detection. We further demonstrate that the Alu-initiated NAIP transcripts extend to the 3′ terminus, and that the associated ORF, harbouring only NBD and LRRs, is translated in a variety of cell lines and primary human tissues. Our findings also suggest that the other predicted novel NAIP proteins are expressed, in addition to the BIR-less isoform directly assessed here. It is notable that the tissue blot we screened derives from one adult individual, with the exception of testis, indicated by the manufacturer as an accidental fatality. An earlier analysis of pooled primary human tissue samples using a different antibody, also revealed similar NAIP protein isoforms that were speculated to arise by alternative splicing [35]. Nonetheless, the data presented here substantiate transcriptome analyses that reveal alternative promoter usage as an important source of alternative mRNAs and proteins [56], [57].

The NAIP gene first rose to prominence when it was cloned as a putative disease allele for the neurodegenerative disorder, Spinal Muscular Atrophy (SMA) [16], but is now understood to influence SMA severity, which is induced by the adjacent SMN gene [58]. Its identification did seed discovery of the Inhibitor of Apoptosis Protein (IAP) family in animals [19]. The IAPs sequester activated caspases, the agents of cell death, via their signature N-terminal BIR domains [20]. Interest in NAIP was renewed through the discovery that polymorphism of the murine Naip5 (Birc1e) copy solely determines permissiveness of Legionella pneumophila replication in host macrophages [25]. Human Legionella infections result in Legionnaire's disease, a severe type of pneumonia [59]. It was recently shown that human NAIP also blocks L. pneumophila replication in cell lines and primary cells, suggesting a common function [60]. NAIP-dependent sensing of cytosolic microbial patterns is LRR-dependent, and is currently known to respond to Legionella and Salmonella typhimurium flagellin [26]. These and other findings point to an important role in the innate immune response, and justify the inclusion of NAIP in the NLR superfamily [29]. Invariably, the NLRs possess a central NBD and C-terminal LRRs; collectively they survey the cytosol for pathogen associated molecular patterns and elicit the appropriate response [61].

While the potential functions of the novel NAIP protein isoforms are unknown, there are several possibilities. Firstly, NAIP proteins are known to homo-oligomerize via their NBD [17], therefore, expression of BIR-truncated isoforms and their subsequent interaction with NAIPfull, could be a mechanism whereby its anti-apoptotic properties are effectively dispersed among a greater number of cytosolic molecules. Alternatively, these could be dominant negatives and serve to regulate the amount of anti-apoptotic NAIP molecules active in a given cell. Finally, expression of NAIP protein isoforms could represent a new example of innovation within the innate immune system, whereby hetero-oligomerization of NLRs creates diversity among these cytosolic sensors, analogous to the Natural Killer inhibitory cell receptor repertoire [62]. Indeed, NBD-mediated heterotypic interactions of some NLRs, including NAIP, have been demonstrated [63]. Moreover, Naip was also shown to co-precipitate with its closest homologue, ICE protease activating factor (Ipaf) [27]. Together these proteins activate Interleukin converting enzyme (ICE or caspase 1), and initiate caspase 1-dependent cell death in response to cytosolic flagellin [26], [27], [28]. Although caspase 1 is required to cleave the inflammatory cytokines proIL-1β and proIL-18 into their active forms, their involvement in this process remains unresolved. Interestingly, and perhaps not coincidentally, the cellular processes affected by IL 1β – proliferation, differentiation, and apoptosis – are the same as those influenced by AP-1 transcriptional regulation [64].

Genes involved in immunity tend to permit regulatory variation [8], as do multicopy genes [52]. While it is known that alternative 5′/3′ ends create genetic variation that leads to proteome evolution [56], [57], [65], the effect of Alu elements is under appreciated. Here we show that transcription from Alus generates a novel NAIP ORF that is subsequently translated, clearly indicating the effect they have on not only gene regulation, and perhaps establishment of transcriptional networks [13], [30], but also proteome evolution.

Methods

Ethics Statement

The blood sample was obtained with written informed consent according to a protocol approved by the University of British Columbia Research Ethics Board.

RNA and Reverse Transcription

With the exception of blood, all human RNA was purchased from Clontech (Mountain View); each sample consists of pooled material from multiple individuals. Blood was obtained from a healthy human adult with informed consent and the sample subsequently underwent erythrocyte reduction. RNA from remaining peripheral blood leukocytes (PBLs) was isolated using the QIAmp RNA Blood Mini Kit (Qiagen). Where necessary, RNA was isolated from candidate cell lines using TRIzol (Invitrogen) according to the manufacturer's recommendations. Prior to reverse transcription, RNA was quantified using a Qubit fluorometer (Invitrogen). All cDNA synthesis was prepared by random hexamer-primed Superscript III Reverse Transcriptase (Invitrogen), as directed by the manufacturer.

RT-PCR

All RT-PCR, except as indicated below for amplification of the NAIP ORF and generation of the expression vector, was performed with Platinum Taq DNA Polymerase (Invitrogen) and the relevant primers are listed in Table S1, all used at 10 µM. Optimal primer annealing temperatures were deduced using the temperature gradient function of an iCycler (Bio-Rad) over 35 cycles. Subsequent experiments were carried out at the optimal Tm for each primer set in a GeneAmp PCR System 9600 (Applied Biosystems). Discrimination of 5′ vs 3′ NAIP transcript levels was carried out at 30 cycles. The full-length NAIP ORF deriving from the Alu SINEs was obtained by amplification with Phusion High Fidelity DNA Polymerase (Finnzymes). As expected, primers within Alu SINEs yielded a multitude of products and were subsequently resolved by Southern blotting. Probe was generated with radiolabeled dCTP32 using the random primer labeling kit (Invitrogen) as directed. Pre-hybridization, hybridization, and washes of Zeta-probe GT membranes (BioRad) were performed using ExpressHyb (Clontech) according to manufacturer's specifications. Exposure of BioMax Film (Kodak) for one hour or less was sufficient to adequately differentiate true bands from background.

5′ Rapid Amplification of cDNA Ends

Using the First-choice RLM RACE Kit (Ambion) the 5′ termini of human NAIP were deduced as before [15]. We revised our initial approach [15] by designing gene-specific reverse primers to a downstream exon, common to all predicted NAIP copies (primers listed in Table S1); previously primers could only surmise expression of NAIPfull. Subtle variations in RT-PCR product size was observed across a range of Tms (55°–60°) – since the full complement of NAIP start sites was being queried – therefore, all unique bands were purified using the QIAquick Gel Extraction Kit (Qiagen) and cloned into the pGEM-T vector (Promega) prior to sequencing (McGill University and Génome Québec Innovation Centre). Importantly, consistent amplification patterns were observed within a given Tm. We similarly tested mouse kidney RNA; although we identified novel intraexonic start sites for mNaip2, qRT-PCR only showed a slight increase (1.21) of 3′ over 5′ ends (data not shown).

Quantitative RT-PCR

The cDNA used for quantitative RT-PCR with Power SYBR Green PCR Master Mix (Applied Biosystems) in the ABI 7500 Real Time PCR System (Applied Biosystems) was prepared as above. Primers (10 µM) were determined to amplify equally efficiently across a broad range of template dilutions by standard curve (listed in Table S1). The comparative CT method was used to quantify targets; CT values were normalized to β-actin levels in each tissue and expressed relative to the indicated target in the indicated tissues. Experiments were conducted at least four times for each primer set, with cycling parameters as follow: 50°C, 2 min; 95°C, 10 min; [95°C, 15 s; 60°C, 1 min] X 40 cycles. For initial experiments, where primer efficiencies were being determined, dissociation curves and –RT controls were included, indicating the specificity of amplification and lack of DNA contamination in template preparations, respectively (data not shown). Alternative splicing variants posed a problem in primer design for the NAIPERV-P and NAIPSg targets. For NAIPERV-P we quantified only one of the variants and estimated that it accounted for ~40% of all total LTR-derived transcripts, as before [15]. For NAIPSg, we designed primers spanning exon junctions of both isoforms and combined their proportions.

Generation of constructs

Placental genomic DNA was obtained from the laboratory of Dr. P. Medstrand (Lund University) and subsequently used to PCR amplify the NAIP promoter regions and open reading frame (ORF). Promoter constructs. Testis-specific LTR (or NAIPERV-P), the ubiquitous NAIPfull, and the Alu-derived NAIPSg and NAIPJb promoters were amplified by PCR using Phusion High Fidelity DNA Polymerase (Finnzymes) in an iCycler (BioRad) over 35 cycles, the primers used are listed in Table S1. The respective products are approximately 500 bp and centered on the transcription start sites. All primers possessed BglII and HindII recognition sites to facilitate directional cloning into a modified pGL3B vector described elsewhere [15]. Sequencing (McGill University and Génome Québec Innovation Centre) verified fidelity of amplified fragments.

Expression vector

The preserved ORF deriving from NAIPSg and NAIPJb transcripts was amplified by Phusion High Fidelity DNA Polymerase (Finnzymes) from human testis cDNA (as described above) over 35 cycles, primer sequences are indicated in Table S1. The desired amplicon was isolated using the PureLink Quick Gel Extraction Kit (Invitrogen) and subsequently dATP-tailed with Taq DNA Polymerase (Invitrogen) to facilitate cloning into the pGEM-T vector (Promega). Sequencing not only confirmed that the ORF was cloned error-free, but also that NAIP2 is expressed, in addition to NAIPfull, on account of a single representative nucleotide difference. Xho1 and Nco1 recognition sites incorporated into primers were utilized to subclone the sequenced ORF into the CTV 211 hemagglutinin (HA) epitope-bearing mammalian expression vector, generously provided by Dr. R. Kay (Terry Fox Laboratory). All vectors were amplified in E. coli DH5α and purified using the Nucleobond AX (Clontech) maxi prep kit, and quantified using the Qubit fluorometer (Invitrogen).

Cell culture and transient transfection

HeLa, NTera2D1, LNCaP, and Jeg3 cells were cultured in DMEM (Stem Cell Technologies) and PC3 cells in RPMI 1640 (Stem Cell Technologies), and incubated at 37° and 5% CO2. All media formulations were supplemented with 10% Fetal Bovine Serum (Invitrogen) and maintained in penicillin/streptomycin, except when undergoing transfection experiments. Prior to transfection of promoter constructs cells were seeded at 105 cells/well, or 2×105 cells/well for NTera2D1, in a 24-well dish overnight. Lipofectamine 2000 (Invitrogen) was used to transfect the indicated cells with the indicated vectors according to manufacturer's specifications. Approximately 6-8 hours post-transfection cells were washed with PBS (Stem Cell Technologies) and fresh complete media was added to allow for production of the reporter for an additional ~24 hours. The HANAIP expression vector, was transiently transfected into HeLa, PC3, and NTera2D1 cells using Metafectene (Biontex) as recommended by the manufacturer.

Reporter gene assays

Prior to lysis, cells were washed with PBS, processed, then analyzed for firefly and Renilla luciferase activity using the Dual Luciferase Reporter Assay System (Promega) as indicated by the manufacturer. All values were standardized to the Renilla luciferase internal control to normalize for transfection efficiency, then expressed relative to the modified promoterless pGL3-Basic vector.

Computational tools

Dot plots

Analysis of the underlying DNA sequence of 5q13.3 was performed to better understand the exons mapping to particular NAIP copies. DNA sequences were obtained from the UCSC Human Genome Browser March 2006 (hg18) assembly [34]. The genomic sequence of NAIPfull (chr5:70,298,269-70,360,000) was used to assess exon architecture of the remaining copies: NAIP1 (chr5:70,425,120-70,469,539); NAIP2 (chr5:69,424,009-69,495,811); and ψNAIP1 and 2 (chr5:69,780,634-69,828,298; 68,921,612-68,967,595). Indicated sequences were compared using the web-based jdotter (http://athena.bioc.uvic.ca/workebnch.php?tooljdotter&db=). Sequence Analysis. Sequenced clones were uploaded, managed, and analyzed in the SDSC Biology Workbench (http://workbench.sdsc.edu). Precise mapping of the clones to the human genome was completed using the BLAT tool in the UCSC Genome Browser [34]. ORF prediction. Sequences of interest were scanned for open reading frames using NCBI's ORF Finder, and subsequent analysis of encoded domains was completed with BLASTP.

Supporting Information

Figure S1

Homology of human NAIP copies. Dot plots were performed to better understand the exon architecture of each NAIP copy. The NAIPfull copy in the 2006 assembly of the human genome (70,298,269–70,360,000) was compared to the genomic sequence underlying the other NAIP copies (as indicated). The coordinates of tested sequences are shown.

Figure S2

Unequal levels of NAIP 5′ and 3′ transcription. Semi-quantitative RT-PCR was performed at a low cycle number across a panel of human tissues to determine the levels of NAIP 5′ and 3′ transcription. Red arrowheads indicate localization of the primers used in this experiment, and are shown relative to a diagram of NAIPfull, at bottom.

Figure S5

Broad transcription of novel NAIP isoforms. RT-PCR was performed to determine the breadth of expression of NAIP from the Alu and GUSBP1 3′ UTR-contained TSS, represented by bent arrows. Color-coded arrows indicate the primers used: expression from NAIPSg is indicated by blue arrows and box; expression from NAIPGUSBP1 is indicated by purple arrows and box; and expression from NAIPJb is indicated by orange arrows and box. No splicing is observed between the AluJb transcription start site and the adjacent downstream exon; +/− RT controls indicate low, or no, contamination of genomic DNA. Diagrams are not drawn to scale.

Table S1

Primers used in this report. A list of all primers used throughout this investigation is sectioned according to the general application for which they were designed. Associated with each primer is the sequence, the Tm at which it was utilized, as well as a note specifying its particular application.

Acknowledgments

We thank Drs. C. Eaves and P. Medstrand for human blood and placenta samples; Drs. R. Kay and C. Cohen for comments on the manuscript; and L. Gagnier and J. Ruschmann for technical assistance.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This work was supported by grant #10825 to DM and grant #86730 to YW from the Canadian Institutes of Health Research (http://www.cihr.ca), with core support provided by the BC Cancer Agency. MR is supported by a studentship from the Michael Smith Foundation for Health Research (http://www.msfhr.org). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.