Summary

The cellulose synthase (CesA) gene family encodes the catalytic subunits of a large protein complex responsible for the deposition of cellulose into plant cell walls. Early in vascular plant evolution, the gene family diverged into distinct members with conserved structures and functions (e.g. primary or secondary cell wall biosynthesis). Although the functions and expression domains of CesA genes have been extensively studied in plants, little is known about transcriptional regulation and promoter evolution in this gene family.

•

Here, comparative sequence analysis of orthologous CesA promoters from three angiosperm genera, Arabidopsis, Populus and Eucalyptus, was performed to identify putative cis-regulatory sequences. The promoter sequences of groups of Arabidopsis genes that are co-expressed with the primary or secondary cell wall-related CesA genes were also analyzed.

Introduction

The importance of carbon sequestration into cellulosic biomass as a means to mitigate carbon emissions has recently been the subject of much debate (Righelato & Spracklen, 2007). This debate is fuelled by the proposed use of cellulosic biofuels as a sustainable alternative to fossil fuels (Gray et al., 2006). Future demand for cellulosic biomass may be met in part by large-scale plantation of fast-growing forest tree species. However, the success of such an endeavor will depend on the accelerated domestication of forest trees aided by molecular breeding and genetic engineering (Halpin & Boerjan, 2003; Boerjan, 2005). Commercial genetic modification of trees is in its infancy compared with crop plants. Despite the sequencing of the first tree genome (Populus trichocarpa, Tuskan et al., 2006), few functional data are available for regulatory sequences in tree genomes. This information is crucial for the construction of predictably expressed transgene constructs, an objective that will be particularly important for large, long-lived perennials. Only a few examples have been published of well characterized forest tree promoter sequences (Vandermijnsbrugge et al., 1996; Chen et al., 2000; Lacombe et al., 2000; Wu et al., 2000; Lauvergeat et al., 2002; Baghdady et al., 2006). There is also an incomplete understanding of the transcriptional networks that underlie the woody transcriptome. To begin uncovering these networks, it is necessary to characterize the promoter sequences of many more wood formation genes and to associate cis-regulatory elements in these promoters with transcription factors. A thorough understanding of these associations may ultimately make it possible to construct synthetic promoters for modular and precise transgene expression in trees (Venter, 2007).

RNA polymerase II promoters generally consist of proximal (core) and distal promoter regions. Core promoter elements are conserved sequences to which the transcriptional machinery binds in order to initiate transcription (Smale & Kadonaga, 2003). Comparison of plant and animal promoters (Yamamoto et al., 2007) reveals that plant core promoters also contain key elements such as the TATA-box, Inr (Initiator element, Lo & Smale, 1996) and DPE (Downstream Promoter Element, Kutach & Kadonaga, 2000). Molina & Grotewold (2005) found that only 29% of Arabidopsis promoters contain a TATA-box and it is therefore likely that the majority of plant promoters rely on Inr, DPE or other elements for the initiation of transcription.

The distal promoter region contains cis-regulatory elements such as enhancers and repressors, which are short DNA sequences (c. 6–12 bp) to which transcription factors bind in a sequence-specific manner (Smale, 2001) to modulate gene expression. In eukaryotes, cis-regulatory elements are scattered throughout the distal promoter regions, which can span several thousand base-pairs upstream of the transcriptional start site. However, cis-elements can also occur in the proximal promoter region, untranslated regions (UTRs) and introns (Loke et al., 2005; Hughes, 2006). Overall, nucleotide diversity in promoter regions is higher than that seen in coding DNA (Simko et al., 2006), but dispersed areas of sequence conservation do occur in promoter regions (Lockton & Gaut, 2005) and these may represent clusters of regulatory elements.

Orthologous promoters share cis-regulatory elements even across considerable evolutionary distance due to the maintenance of regulatory networks (Lockton & Gaut, 2005). This is despite the fact that, overall, orthologous promoters from different genera share very little sequence similarity (Blanchette & Tompa, 2002). Putative cis-regulatory elements can therefore be identified by comparative analysis of orthologous promoters from different species or genera in an approach known as phylogenetic footprinting (Gumucio et al., 1992; McCue et al., 2002; Zhang & Gerstein, 2003). Software programs such as Footprinter (Blanchette & Tompa, 2003) and rVista (Loots et al., 2002) have enhanced the throughput and resolution of such analyses. In a recent study, Michael et al. (2008) demonstrated that plant circadian regulatory networks are functionally conserved in poplar, rice and Arabidopsis.

Comparative analysis of the promoters of co-expressed genes presents another alternative to identify functional cis-regulatory elements. Co-expressed genes form part of overlapping regulatory networks, and their promoters therefore contain conserved transcription factor binding sites (Harmer et al., 2000). Tompa et al. (2005) compared 13 algorithms designed for cis-element prediction in the promoters of co-expressed genes and found that a combination of complementary algorithms provided improved fidelity and accuracy of cis-element prediction. Meta-analysis of microarray data (Brown et al., 2005; Persson et al., 2005) can be used to identify clusters of co-expressed genes, the promoters of which can be used to identify shared cis-regulatory elements (Haberer et al., 2006). By combining phylogenetic footprinting with promoter analysis of co-expressed genes in Arabidopsis and Brassica, Haberer et al. (2006) recently identified a number of well-known (e.g. MYB and WRKY transcription factor binding sites) and novel plant cis-regulatory elements.

The presence of conserved cis-elements in promoter sequences from evolutionarily distant species may provide evidence for ancient regulatory networks. Arabidopsis, Populus and Eucalyptus represent three angiosperm lineages that have been separated for more than 50 Myr (since the late Cretaceous; Magallon & Sanderson, 2006). Despite their obvious morphological differences, woody and herbaceous plants share many structural genes, such as those encoding cell wall-synthesizing enzymes (Tuskan et al., 2006). For example, cellulose biosynthesis in plants is driven by a highly conserved, rosette-shaped, multi-subunit enzyme complex composed of cellulose synthase (CESA) proteins encoded by the CesA gene family (Delmer, 1999). Arabidopsis thaliana requires at least three CESA proteins for the biosynthesis of cellulose in primary cell walls (Desprez et al., 2007) and three different CESA proteins for secondary cell wall biosynthesis (Taylor et al., 2003). Gene expression and phylogenetic analyses of CesA genes from forest tree genera such as Populus (Suzuki et al., 2006) and Eucalyptus (Ranik & Myburg, 2006) suggest that this specialization of primary and secondary cell wall-associated CesA genes is shared between herbaceous and woody plant species. Moreover, it appears that the specialization of the plant CesA gene family into orthologous clades with conserved functions occurred before gymnosperm–angiosperm divergence (Roberts & Roberts, 2007).

Orthologous CESA proteins possess similar functions and exhibit high sequence similarity (Richmond & Somerville, 2000). Additionally, orthologous CesA genes from herbaceous and woody plants exhibit similar spatiotemporal expression patterns (Ranik & Myburg, 2006). Although these findings suggest that the transcriptional regulation of CesA orthologs is conserved in flowering plants, the molecular components of the underlying regulatory network remains to be characterized. Here, we report the isolation of upstream promoter regions of six Eucalyptus CesA genes and the results of a comparative analysis of these promoters with orthologous promoters from Arabidopsis and Populus. Promoter analysis based on the phylogenetic relationships and the two main gene expression profiles of the CesA genes allowed us to identify putative cis-elements related to cellulose synthesis in primary and secondary cell walls of Arabidopsis, Populus and Eucalyptus. We propose that these sequences may be part of a conserved regulatory network controlling cellulose biosynthesis in plants.

Materials and Methods

Plant material and nucleic acid isolation

Leaf samples used for DNA isolation were obtained from a Eucalyptus grandis W. Hill ex Maiden clone (TAG14) provided by Mondi Business Paper South Africa. The leaf samples were submerged in liquid N2 in the field, transported on dry ice and stored at –80°C until DNA isolation. Genomic DNA was isolated using a CTAB method (Doyle & Doyle, 1987). Total RNA was isolated from xylem and young leaves according to Chang et al. (1993).

Promoter isolation

Genome walking was performed as described by Siebert et al. (1995) using the Universal Genome Walker kit (Clontech, Palo Alto, CA, USA). The initial genome walking primers were designed to anneal within the first exons of the EgCesA genes described by Ranik & Myburg (2006). Genome walking PCRs were performed according to the manufacturers’ instructions and the resulting products were cloned (InsT/A clone kit, MBI Fermentas, Hanover, MD, USA) and sequenced (Applied Biosystems, Foster City, CA, USA). In order to obtain at least 1.5 kb of promoter sequence for each gene, additional walks were performed when needed using primers designed on the newly sequenced 5′ regions. Genome walking product sequences were assembled using ContigExpress (Vector NTI, Invitrogen, Carlsbad, CA, USA). Finally, the full-length promoters were amplified end to end (primer sequences in Supplementary material, Table S1), cloned and sequenced to verify that the contigs spanned a continuous region of DNA. At least three independently generated clones were sequenced per promoter and used to compile a consensus sequence, which was employed in further analyses.

Orthologous Arabidopsis and Populus CesA promoters

Arabidopsis and Populus CesA genes orthologous to the EgCesA genes were previously identified by Ranik & Myburg (2006). The promoters of the Arabidopsis thaliana orthologs (AT5G44030.1, AT4G18780.1, AT5G17420.1, AT5G05170.1, AT4G32410.1, AT5G64740.1) were obtained from The Arabidopsis Information Resource (TAIR; http://www.arabidopsis.com). The Populus trichocarpa CesA orthologs were obtained from the JGI Populus Genome Browser (http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html). The Populus sequences were acquired by first downloading the poplar CesA cDNA sequences from NCBI (accession numbers AF072131.1, AY095297.1, AF527387.1, AY162181.1, AY055724.2 and AY196961.1), aligning them to the poplar genome and retrieving a 2 kb sequence upstream of the start codon for each gene. The general nomenclature of plant CesA genes is based on order of discovery rather than orthology. We have summarized the six EgCesA genes along with their corresponding Arabidopsis and Populus orthologs for ease of reference (Table 1).

In silico identification of transcriptional start sites

The transcriptional start site (TSS) prediction program, TSSP (Shahmuradov et al., 2005, available at http://www.softberry.com), is trained on plant promoters located in plantpromDB (Shahmuradov et al., 2003). TSSP was used with default settings to predict the TSSs of the Eucalyptus, Arabidopsis and Populus promoters. In the cases were a TSS could not be predicted by TSSP, a second program, NNPP (Neural Network Promoter Prediction, Reese, 2001), was used to identify the TSS. NNPP searches for a broader description of the TSS using neural networks and is thus less restrictive in its prediction parameters.

Promoter datasets

Two CesA promoter datasets were compiled from the CesA promoters of Eucalyptus, Arabidopsis and Populus based on their expression patterns. The first group contained the orthologous promoter regions of the CesA genes predicted to be associated with primary cell wall formation and is referred to as the ‘primary CesA set’. The second group of CesA promoters was obtained from the CesA genes associated with secondary cell wall formation and is referred to as the ‘secondary CesA set’. Two additional datasets were constructed from promoters of Arabidopsis genes identified by Persson et al. (2005). These two sets each contained the promoters of 17 Arabidopsis genes, which were found to be co-expressed with the AtCesA genes associated with primary (primary CesA related set), or secondary (secondary CesA related set) cell wall formation (Table S3). The promoters of the respective AtCesA genes were included in these datasets. The promoters of all four datasets were truncated to include 1 kb of DNA sequence upstream of the predicted TSS.

Motif prediction

Three algorithms were used to predict motifs that are over-represented in the four datasets and shared by the CesA promoters of Eucalyptus, Arabidopsis and Populus. In addition, the analysis was structured to find motifs that are over-represented in primary or secondary cell wall-associated CesA promoters. MotifSampler (http://homes.esat.kuleuven.be/~thijs/Work/MotifSampler.html) is based on Gibbs sampling and allows for the identification of statistically over-represented motifs in a set of unaligned sequences (Thijs et al., 2001, 2002). We used default settings for the purposes of this study with the exception that two or three motifs were retrieved in each of ten iterative searches for two different lengths of motif (6 and 8 nt). The MotifsSmpler analysis was performed using the Arabidopsis background model provided by the software package. Weeder Web (Pavesi et al., 2004) is the web interface (http://159.149.109.16:8080/weederWeb/) for Weeder (Pavesi et al., 2001), an algorithm that discovers conserved motifs in a set of related regulatory DNA sequences. Default settings were used and both strands were searched using the ‘thorough scan’ option. All of the promoter sets were analyzed and the ten highest scoring motifs per set were recorded. These motifs and their reverse complements were compared with the motifs identified by MotifSampler in order to identify motifs predicted by both algorithms. POCO (http://ekhidna.biocenter.helsinki.fi/poxo/poco/, Kankainen & Holm, 2005) identifies motifs that are over-represented in one dataset compared with a background set, but under-represented in another dataset compared with the same background set (all Arabidopsis promoter sequences in this case). Default POCO settings were used to analyze the primary and secondary CesA sets as oppositely expressed clusters. The search was performed three times, with motifs not identified in all three searches being discarded. The resulting motifs were compared with those identified by MotifSampler and Weeder. Motifs that were not identified by at least two of the three programs and motifs with a P-value (calculated by POCO) > 0.05, were discarded.

Database-assisted motif annotation

A number of the motif sequences identified in the study had the same or very similar sequences and likely represented the same putative cis-elements. Taking into account degeneracy, overlapping motif sequences that differed in only one or two positions were grouped together. The motifs of each group were manually aligned and a consensus sequence was generated for each group. The consensus was constructed by evaluating each position in the motif alignment and assigning the most frequently occurring base to that position in the consensus. In cases where there were only two sequences and they did not agree, both bases were included in the consensus. The consensus motifs identified in one or more datasets were named CesA-related promoter elements (CRPE) and numbered sequentially. The consensus motifs were used in homology searches of the PLACE database (http://www.dna.affrc.go.jp/htdocs/PLACE/signalup.html; Higo et al., 1999) for similarity to previously described plant cis-regulatory elements.

In vitro characterization of transcription start sites

The predicted TSSs were confirmed by cloning of the 5′ termini of the EgCesA transcripts using 5′ rapid amplification of cDNA ends (5′ RACE, RLM-RACE kit, Ambion, Austin, TX, USA). Additionally, primer extension was performed to confirm the TSSs of EgCesA1, 2 and 3. Ten micrograms of total RNA extracted from xylem was annealed to 20 ng of Cy5.5-labelled primer (Inqaba Biotec, Pretoria,SouthAfrica), specific for the 5′ UTR region of each gene upstream of the translational start site. The RNA-primer complex was incubated for 1 h at 42°C in the presence of 1 µl ImpromII reverse transcriptase (Promega, Madison,WI,USA) according to the manufacturer's instructions. The single-stranded reverse transcription products were resolved on an 8% denaturing polyacrylamide gel using a Li-Cor DNA Analyzer (Model 4200S, Li-Cor Biosciences, Lincolin,NE,USA).

Construction of promoter::reporter vectors

Full-length EgCesA promoters were cloned into the pCR8/GW/TOPO entry vector (Invitrogen) and were subsequently transferred upstream of β-glucuronidase (GUS) (Jefferson et al., 1987) in the binary vector pMDC162 (Curtis & Grossniklaus, 2003) using LR Clonase (Invitrogen) according to the manufacturer's instructions. Sense and anti-sense orientations of promoters were cloned ahead of the reporter gene. The entire expression cassettes consisting of the promoter and reporter genes were sequenced from plasmid DNA.

Transformation of Arabidopsis thaliana

Binary vectors were introduced into Agrobacterium tumefaciens strain LBA4404. Arabidopsis thaliana ecotype Col-0 was transformed by the floral dip method (Clough & Bent, 1998). Selection of transgenic seedlings was performed using hygromycin B (20 mg l−1) according to Nakazawa & Matsui (2003). To confirm the integration of the T-DNA into the Arabidopsis genome, PCR was performed on genomic DNA using GUS-specific primers. T1 plants confirmed to carry the transgenic constructs were advanced to the T2 and T3 generations with hygromycin selection.

Histochemical localization of GUS in Arabidopsis tissues

Fresh stem segments from confirmed T2 and T3 transgenic plants, as well as wild-type Col-0 Arabidopsis, were harvested and vacuum-infiltrated for 10 min in staining buffer followed by overnight incubation at 37°C (Jefferson et al., 1987). Chlorophyll was removed from tissue sections using 100% ethanol before free-hand sectioning. Sections were mounted in 50% glycerol and examined by light microscopy.

Quantitative GUS assays

β-Glucuronidase assays were performed on tissue extracts as previously described by Jefferson et al. (1987). GUS activity was calculated as pmol of 4-MU min−1 mg−1 of tissue for extracts from stem (secondary cell wall-enriched) and young rosette leaves (enriched in primary cell walls).

Results

Isolation of Eucalyptus grandis CesA promoter regions

We isolated the upstream regions of five of the six E. grandis CesA (EgCesA) genes that were described previously (Ranik & Myburg, 2006), as well as the promoter region of a seventh CesA gene, EgCesA7, which was recently isolated (Bradfield et al., unpublished). End-to-end amplification of the promoter regions produced fragments ranging from 787 bp for EgCesA7 to 2 kb for the EgCesA1 promoter (Fig. 1). Two fragments, EgCesA5A (1569 bp) and EgCesA5B (1373 bp), were observed when the EgCesA5 promoter was amplified from genomic DNA. The EgCesA5A and B fragments were highly similar (95%) except for a 196 bp indel (Fig. 1b).

In silico and in vitro analysis of the core CesA promoter elements

The TSSP and NNPP programs were able to predict the core promoter elements of the Eucalyptus, Arabidopsis and Populus CesA genes (Fig. 1). Only two of the 19 CesA promoters (EgCesA1 and EgCesA3, Fig. 1d,f) did not have predicted TATA boxes. Instead they both possessed an initiator element. The TSSP software was not able to predict a transcriptional start site (TSS) for three of the six poplar promoters (PtrCesA2, 4 and 5). For these promoters, the NNPP software package was able to predict TSSs. The overall organization of the core promoter regions was well conserved among some of the orthologous genes (Fig. 1a,f). EgCesA3 and its putative orthologs (Fig. 1f) all had very short predicted 5′ UTRs, while EgCesA4, PtrCesA5 and AtCesA3 (Fig. 1a) all contained an intron in their predicted 5′ UTRs. By contrast, the predicted 5′ UTR of PtrCesA3 was much longer than those of its putative orthologs, EgCesA2 and AtCesA4 (Fig. 1e).

To verify the in silico TSS predictions, primer extension analysis was conducted for the EgCesA genes. Primer extension products for EgCesA1 and 3 suggested 5′ UTRs of 230 and 115 bp, respectively (Fig. 2), which were similar to the in silico predicted 5′ UTRs (Fig. 1). However, the 620 bp EgCesA2 primer extension product was not consistent with the in silico predicted size of 154 bp, but rather with the much longer 5′ UTR predicted for its ortholog, PtrCesA3. This discrepancy may suggest that the EgCesA2 promoter possesses an unusual core promoter sequence, or perhaps contains multiple transcription start sites. We did not obtain definite primer extension products for the primary cell wall-associated CesA genes, possibly due to their comparatively low expression levels (Ranik & Myburg, 2006).

Figure 2. Determination of the transcriptional start sites of EgCesA1, 2 and 3 by primer extension. Sections of Li-Cor PAGE (7%) gel images showing the major primer extension products derived from the EgCesA1, 2 and 3 transcripts (arrows). Fragment sizes of the IRD700-labeled size standard are indicated on the left of each gel section. M, molecular weight standard; P, primer extension reaction lanes.

In silico identification of putative cis-regulatory motif sequences

Based on the outputs of MotifSampler, Weeder and POCO, a total of 380 motif sequences were found to be over-represented in the two CesA promoter datasets. These included many redundant or overlapping sequences, which we systematically removed or grouped to obtain a nonredundant set of 80 putative motifs (Tables S4–S7). Furthermore, we only considered motifs that were detected by at least two of the three programs. This resulted in a final set of 32 motifs, of which 20 were over-represented in the primary CesA set, eight in the secondary CesA set and four in both sets (Table 2). We used the same approach to detect over-represented motifs in the primary and secondary CesA-related sets (promoters of genes co-expressed with the CesA genes in Arabidopsis). A total of 21 motifs were over-represented in the primary CesA-related set, 22 in the secondary CesA-related set and one in both sets (Table 3).

Table 2. Over-represented motifs identified by POCO, MotifSampler and Weeder in the promoters of the primary and secondary cell wall-associated CesA genes of Arabidopsis, Populus and Eucalyptus

The number following CRPE (CesA-related promoter element) represents the order of motif annotation.

b

Consensus of the motifs detected by the three software programs (listed in Tables S4–S7).

c

Cis-element in the PLACE database which most closely resembles the motif identified in this study.

d

PLACE cis-regulatory element putative function and the identity of the element as represented on the PLACE database.

Primary CesA set

CRPE1

G(A/T)CGGTG(A/G)AGCTGTTG(G/T)

NO HIT

–

CRPE2

GA(C/G)GGCAGG

NO HIT

–

CRPE3

NTGTCGGTG

NO HIT

–

CRPE4

(A/C)TGTCGG

CCTACGTGGCGG

Abscisic acid responsive element (ABRE2HVA1)

CRPE5

GNCA(C/G)TGA

CGTCAATGAA

Jasmonic acid responsive element (JASE1ATOPR1)

CRPE6

N(A/C)TTCTGTC

TTGAACGGCAAGTTTCACGCTGTCACT

Iron deficiency responsive element (IDE2HVIDS2)

CRPE7

GGGGC(A/G)NGNN

GGATTCAAGGGGCATGTATCTTGAATCC

Ethylene response element (EIN3ATERF1)

CRPE8

GGNGGTGG

AGTTGAATGGGGGTGCA

Anthocyanin regulatory element (ARELIKEGHPGD)

CRPE9

CNCNNCNC

NO HIT

–

CRPE10

CCNC(A/C)CCC

CCACCAACCCCC

Vascular-specific expression (ACIIPVPAL2)

CRPE11

GACNGT(C/G)NGTGGGGC

ATAATGGGCCACACTGTGGGGCAT

Stem enhancer element 1 (SE1PVGRP18)

CRPE12

ATN(A/T)ATTA

CATTAATTAG

Phosphate response domain (GMHDLGMVSPB)

CRPE13

GC(A/T)NGC

TCCATGCATGCAC

Abscisic acid responsive element (RYREPEAT4)

CRPE14

NG(A/G)CNGTG

NO HIT

–

CRPE15

G(C/T)GCTC

CCAGTGTGCCCCTGG

Phloem-specific gene expression (RNFG2OS)

CRPE16

GAGCG(A/C)

CATGGGCGCGG

Light repression element (RE1ASPHYA3)

CRPE17

GTC(G/T)GT

NO HIT

–

CRPE18

ANNGA(C/T)AG

NO HIT

–

CRPE19

ACAGNCNG

NO HIT

–

CRPE20

TTTTTT

AATATTTTTATT

AT-rich element (AT1BOX)

CRPE22

C(C/T)C(C/G)NCCC

NO HIT

–

CRPE23

TNNCN(G/T)NC

NO HIT

–

CRPE24

GG(C/G)(A/T)(C/G)G(A/G)(G/C)

NO HIT

–

Secondary CesA set

CRPE1

G(A/T)CGGTG(A/G)AGCTGTTG(G/T)

NO HIT

–

CRPE2

GA(C/G)GGCAGG

NO HIT

–

CRPE6

N(A/C)TTCTGTC

TTGAACGGCAAGTTTCACGCTGTCACT

Iron deficiency responsive element (IDE2HVIDS2)

CRPE7

GGGGC(A/G)NGNN

GGATTCAAGGGGCATGTATCTTGAATCC

Ethylene response element (EIN3ATERF1)

CRPE25

(A/G)C(C/T)(C/G)TGCCC

CCAGTGTGCCCCTGG

Phloem-specific expression (RNFG2OS)

CRPE26

TCCTGC(C/T)G

NO HIT

–

CRPE27

(C/G)CTGAAGG

CTGAAGAAGAA

Systemic resistance regulation (TL1ATSAR)

CRPE28

NNGCATGC

ATCAAGCATGCTTCTTGC

Iron deficiency responsive element (IDE1HVIDS2)

CRPE30

NNNT(C/G)AAG

NO HIT

–

CRPE31

GNGNAGNG

NO HIT

–

CRPE32

(A/G)N(C/G)(C/T)T(A/G)(C/G)C

NO HIT

–

Table 3. Over-represented motifs identified by POCO, MotifSampler and Weeder in the promoters of genes that are co-expressed with the primary and secondary cell wall-associated CesA genes in Arabidopsis

The number following CRPE (CesA-related promoter element) represents the order of motif annotation.

b

Consensus of the motifs detected by the three software programs (listed in Tables S4–S7).

c

Cis-element in the PLACE database which most closely resembles the motif identified in this study.

d

PLACE cis-regulatory element putative function and the identity of the element as represented on the PLACE database.

Primary CesA-related set

CRPE1

G(A/T)CGGTG(A/G)AGCTGTTG(G/T)

NO HIT

–

CRPE2

GA(C/G)GGCAGG

NO HIT

–

CRPE3

NTGTCGGTG

NO HIT

–

CRPE4

(A/C)TGTCGG

CCTACGTGGCGG

Abscisic acid responsive element (ABRE2HVA1)

CRPE5

GNCA(C/G)TGA

CGTCAATGAA

Jasmonic acid responsive element (JASE1ATOPR1)

CRPE33

GTCG(G/T)T

TCCACGTCGA

Seed-specific expression (O2F1BE2S1)

CRPE34

(A/C)(A/G)G(C/T)GG

NO HIT

–

CRPE35

(G/T)TGTCG

CGTGTCGTCCATGCAT

Abscisic acid responsive element (CGTGTSPHZMC1)

CRPE36

(G/T)TGTCG(G/T)(C/T)

NO HIT

–

CRPE37

(A/C)TCAAATC

TTGGTTTTGATCAAAACCAA

Light-activated specific expression (PIIATGAPB)

CRPE38

CG(C/G)GT(C/T)

NO HIT

–

CRPE39

TAATTA

CATTAATTAG

Phosphate response domain (GMHDLGMVSPB)

CRPE40

G(A/T)(C/G)AGTGA

TGGTAGGTGAGAT

Stress responsive element (SRENTTTO1)

CRPE41

AAG(A/T)A(A/G)AC

TGAGGAGACTTGTGAGGT

Auxin-responsive regions (ALF2NTPARB)

CRPE42

GATT(C/G)C

NO HIT

–

CRPE43

TNTATTNA

TTTATTTACCAAACGGTAACATC

Upstream activating sequence (23BPUASNSCYCB1)

CRPE44

C(A/G)GTG(A/G)

CTGTGATTAAATAT

Related GT element (BOX2PVCHS15)

CRPE45

GGTT(A/T)A

GTGTGGTTAATATG

Related GT element (RBCSBOX2PS)

CRPE46

TA(A/C)TTA

AGTTAGTTAAAAGA

Related GT element (SITE3SORPS1)

CRPE47

(C/T)CAC(C/T)GNC

CCACTGACGTAAGGGATGACGCACAATCC

Activation sequence 1 (AS1CAMV)

CRPE48

TG(C/G)(C/T)GG(C/T)G

AGTTGAATGGGGGTGCA

Anthocyanin regulatory element (ARELIKEGHPGDFR)

CRPE49

AGT(C/G)AC

ACAGTTACTA

Putative auxin responsive element (D1GMAUX28)

Secondary CesA-related set

CRPE1

G(A/T)CGGTG(A/G)AGCTGTTG(G/T)

NO HIT

–

CRPE50

GG(T/A)A(A/C)NGC

NO HIT

–

CRPE51

(C/T)CCA(A/T)A(A/T)C

TCCATAGCCATGCAWRCTGMAGAATGTC

Legumin tissue-specific element (LEGUMINBOXLEGA)

CRPE52

C(A/C)AA(C/G)ACA

CACTAACACAAAGTAA

Sclareol box1 (SB1NPABC1)

CRPE53

TNTA(C/T)TTA

TTTATTTACCAAACGGTAACATC

Upstream activating sequence (23BPUASNSCYCB1)

CRPE54

GG(A/G)AAC

TCATGGTAACAATT

Compilation related GT elements (SITE1SORPS1)

CRPE55

TAACNT

TTTATTTACCAAACGGTAACATC

Upstream activating sequence (23BPUASNSCYCB1)

CRPE56

CCTT(A/G)C

CTTACCTTTCATGGATTA

APETALA3 gene promoter element (CARG2ATAP3)

CRPE57

TC(C/T)T(C/T)(A/T)(C/T)C

NO HIT

–

CRPE58

GTNAN(G/T)NA

GGTCANNNAGTC

Elicitor-responsive element (ELRENTCHN50)

CRPE59

TTNCCT

CTTACCTTTCATGGATTA

APETALA3 gene promoter element (CARG2ATAP3)

CRPE60

TNNCTTGG

NO HIT

–

CRPE61

TNAGTG

TGTGTGGTTAATATG

Light-responsive element (LREBOX2PSRBCS3)

CRPE62

GGTGA(A/G)

TGGTAGGTGAGAT

Stress responsive element (SRENTTTO1)

CRPE63

TAGTT(C/T)CA

NO HIT

–

CRPE64

T(G/T)AGCT(A/T)A

TGAGCTAAGCACATACGTCA

Wounding response element (20NTNTNOS)

CRPE65

AAGTNNTT

NO HIT

–

CRPE66

T(G/T)GGGA

AGTTGAATGGGGGTGCA

Putative MYB binding site (ARELIKEGHPGDFR2)

CRPE67

A(A/G)GTTA

GAAATAGCAAATGTTAAAAATA

Positive photoregulatory element (PE1ASPHYA3)

CRPE68

GAGTG(A/T)

GTGTGGTTAATATG

Related GT element (RBCSBOX2PS)

CRPE69

TTGCTT(G/T)G

TATTTGCTTAA

Auxin response element (D3GMAUX28)

CRPE70

TTG(G/T)GG

NO HIT

–

CRPE71

AGGNAA

TTNCC(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)GGNAA

MADS domain (AGAMOUSATCONSENSUS)

Motif annotation

Comparison of the motif sequences to the PLACE database allowed the putative annotation of those motifs that were similar to previously described elements (Tables 2, 3). Only one over-represented motif (CRPE1) was identified in all four datasets (Tables 2, 3). CRPE 4 and 5 were over-represented in the primary CesA set and the primary CesA-related set (Tables 2, 3). These motifs showed similarity to an abscisic acid response element (Kao et al., 1996) and a jasmonic acid response element (He & Gan, 2001), respectively. No motifs were shared by the secondary CesA set and the secondary CESA-related set. Two elements, CRPE6 and 7, were over-represented in the primary and secondary CesA datasets (Table 2), but not in the CesA-related sets (Table 3). CRPE6 showed similarity to an iron deficiency element II (Kobayashi et al., 2003), while CRPE7 showed similarity to an ethylene response element (Solano et al., 1998).

Several of the motifs that were over-represented in the secondary CesA set showed similarity to tissue-specific or temporally regulated elements (Table 3). CRPE25 showed similarity to a phloem-specific element (Yin et al., 1997). CRPE28 showed similarity to the vascular-specific iron deficiency element II (Kobayashi et al., 2003) and CRPE27 showed similarity to a motif involved in disease resistance (Wang et al., 2005). A number of motifs showed little or no similarity to previously identified elements and may therefore represent novel elements that have yet to be characterized (Tables 2, 3).

Spatial distribution and abundance of the motifs in the CesA promoter regions

The spatial distribution and abundance of promoter elements can affect gene expression. A number of the putative cis-regulatory elements differed in abundance between the primary and secondary CesA sets (Fig. 3). For example, CRPE2 (Fig. 3a, turquoise) and CRPE7 (Fig. 3a, orange) were over-represented in the primary and secondary CesA sets, but both had more occurrences in the secondary CesA set. CRPE7 often occurred closely paired with CRPE2 in the primary and secondary CesA sets. CRPE3 (Fig. 3b, red) and CRPE11 (Fig. 3b, green) were over-represented in the primary CesA promoters, but not in the secondary CesA promoters. By contrast, CRPE25 and CRPE27 were highly abundant in the secondary CesA set (Fig. 3c). These motifs may be involved in tissue-specific regulation of CesA genes.

Figure 3. Schematic representation of the occurrences of eight over-represented motifs mapped to the CesA promoters of Arabidopsis, Populus and Eucalyptus. The promoters are represented by the horizontal black lines (+1 to –1000 bp). The orthologous promoter regions are grouped as in Fig. 1. Motif occurrences mapped include the following: (a) CRPE7 (orange) and CRPE2 (turquoise); (b) CRPE11 (green) and CRPE3 (red); (c) CRPE25 (blue) and CRPE27 (yellow); and (d) CRPE6 (magenta) and CRPE28 (purple).

CRPE28 and CRPE6 showed similarity (Table 2) to iron deficiency elements IDE1 and IDE2, respectively (Kobayashi et al., 2003). These elements have been found to work in conjunction to confer a specific expression pattern. CRPE6 was over-represented in both CesA datasets, while CRPE28 was only over-represented in the secondary CesA set (Fig. 3d). Furthermore, CRPE6 and CRPE28 occurred in pairs at a number of positions in the secondary CesA promoters. In particular, the –150 to –400 region of the secondary CesA set contained occurrences of the two motifs within 100 bp of each other in six of the nine promoters (Fig. 3d). The two motifs were not detected in either of the co-expressed datasets in Arabidopsis, possibly suggesting that the observed occurrence pattern of these motifs is specific to the CesA promoters.

Expression of GUS under the control of the EgCesA1 and EgCesA3 promoters

Examination of T1 and T2 Arabidopsis leaves harbouring EgCesA1 and EgCesA3 (Figs 4, 5) promoter::GUS constructs by histochemical staining revealed that the reporter gene was strongly expressed in veins but not in other leaf tissues (Fig. 5c). GUS was expressed primarily in the central vein and specifically in xylem vessel elements (Fig. 4). GUS expression was very low in leaf tissues lacking secondary cell walls (Fig. 5c,e). In stems, the EgCesA1 and EgCesA3 promoters were active in cells laying down secondary cell walls (e.g. maturing xylem vessels (Fig. 5a,b)), but not in cell types undergoing primary cell wall deposition, such as parenchyma, epidermis and young inflorescence tissues. In roots, EgCesA3-driven GUS expression was restricted to the central vascular cylinder (Fig. 5d), but no expression was observed in very young roots (Fig 5e). We were unable to establish the expression pattern of the EgCesA2 promoter, because we failed to obtain transformed plants which contained the promoter::GUS construct, despite repeated attempts at floral dip.

Expression of GUS under the control of the EgCesA4 and 5 promoters

In contrast to the strong tissue-specific expression patterns observed for the EgCesA1 and 3 promoters, transgenic plants harboring EgCesA4 and 5 promoter::GUS constructs exhibited much more ubiquitous, but strong GUS expression. In leaves, GUS expression was observed in the veins and in the nonvein tissues (Fig. 6a,b,d). This was also true of root expression where GUS activity was observed in all root tissues (Fig. 6e). Seedlings expressing GUS under the control of EgCesA4 and 5 promoters exhibited uniform staining (Fig. 6f). As the EgCesA7 promoter was only isolated very recently, we have included it in the in silico analysis, but have not yet been able to confirm its expression in vivo.

Quantitative analysis of GUS expression in T2 Arabidopsis lines carrying sense and anti-sense EgCesA promoters allowed the comparison of EgCesA promoter activity to that driven by the endogenous AtCesA8 promoter (positive control) and the double 35S cauliflower mosaic virus promoter (Fig. 7). Overall, GUS activity was much stronger in stems (vs leaves) of plants carrying sense promoters of secondary cell wall-related CesA genes (EgCesA1, 3 and AtCesA8). By contrast, the primary cell wall-related CesA promoters (EgCesA4 and 5) resulted in stronger GUS activity in leaf tissues (vs stem), but the expression was at least fivefold weaker than the stem expression of the secondary cell wall-associated CesA promoters. Nevertheless, the leaf expression of the primary cell wall-associated CesA promoters was equivalent to the expression of the strong constitutive double 35S promoter in the same tissues (Fig. 7). The anti-sense versions of both the primary and secondary cell wall-associated promoters were expressed at much lower (20- to 100-fold) and variable levels, which could be due to positional effects. The GUS activity levels conferred by the orthologous EgCesA1 and AtCesA8 promoters were similar in stem tissues, but the AtCesA8 promoter conferred two- to threefold higher activity levels in leaf tissues (Fig. 7).

Figure 7. Quantitative assays of GUS activity in transgenic Arabidopsis plants containing various promoter::GUS constructs. GUS activity is indicated on the y-axis. The identity of Arabidopsis lines is indicated on the x-axis. Line numbers followed by S indicate GUS activity in stem tissue, while L denotes GUS activity in leaf tissues of the same line. Lines with sense or anti-sense versions of the promoters are indicated by gray and white bars, respectively, below the x-axis. 35S and 35L indicate the expression of the control 35S promoter::GUS construct in stem and leaf tissues, respectively, while ColS and ColL represent GUS expression in stems and leaves, respectively, of wild-type Columbia plants.

Discussion

The expression patterns of the Eucalyptus EgCesA promoters in Arabidopsis mimic the native expression patterns of their Arabidopsis and Populus orthologs

Due to high amounts of sequence conservation, coding sequences from a variety of plant species are functional when expressed in model plants such as Arabidopsis and Populus (Boerjan, 2005). By contrast, promoter sequences of orthologous genes display very little overall DNA sequence similarity in plants (Barta et al., 2005). Nevertheless, they are often capable of driving heterologous expression of reporter genes in a manner similar to the observed expression patterns in the source species (Baghdady et al., 2006). This is due to the presence of shared sequence elements and transcription factors binding to these elements. The cis-regulatory elements themselves often vary in sequence with only a core motif being shared among diverged species. We aimed to determine whether the upstream regions of the CesA genes from a Eucalyptus tree would act as promoters in a herbaceous model plant and drive transgene expression in a manner consistent with their native expression patterns.

The promoters of two secondary cell wall-related Eucalyptus CesA genes (EgCesA1 and 3) were shown to drive strong GUS expression in Arabidopsis tissues undergoing secondary cell wall deposition, but not in cells producing primary cell walls (such as very young shoot and root tissues, Fig. 5e). This was consistent with the previously observed mRNA concentrations of the genes in Eucalyptus tissues (Ranik & Myburg, 2006) and with the native expression patterns of the orthologous Arabidopsis genes, AtCesA8 and AtCesA7 (Taylor et al., 1999, 2000). Wu et al. (2000) showed that the Populus tremuloides PtCesA1 promoter (orthologous to EgCesA1) drove strong, xylem-specific GUS expression in transgenic tobacco stems. Quantitative analysis of GUS activity in stem and leaf extracts of plants transformed with the EgCesA1Prom::GUS and EgCesA3Prom::GUS constructs confirmed that the Eucalyptus promoters confer very strong transcriptional activity in Arabidopsis stems. GUS activity in stems approached and even surpassed (e.g. EgCesA3Prom::GUS, Fig. 7) levels of expression of the endogenous AtCesA8 promoter. The strength of the EgCesA promoters was further demonstrated by the fact that they were able to drive GUS expression at levels greater (c. four- to sixfold in stem tissues) than seen in the control CAMV35S promoter lines (35S, Fig. 7).

The upstream regions of two primary cell wall-related EgCesA genes (EgCesA4 and 5) also acted as promoters in Arabidopsis, driving the expression of GUS in a more ubiquitous manner. This was consistent with the expression of the genes in Eucalyptus, where EgCesA4 and 5 transcripts occurred in all tissues, but were most abundant where rapid primary cell wall deposition was taking place (Ranik & Myburg, 2006). Microarray expression analysis of the Arabidopsis orthologs (AtCesA1 and 3) showed that these genes are expressed ubiquitously in growing Arabidopsis organs and at constant levels throughout the life cycle (Hamann et al., 2004). The authors also showed that these two genes are expressed at levels equal to or higher than the secondary cell wall-associated AtCesA4, 7 and 8 in all organs. Conversely, we found that, in transgenic Arabidopsis plants, the secondary cell wall-associated EgCesA1 and 3 promoters were consistently more active (in stems and leaves) than the primary cell wall-related EgCesA4 and 5 promoters. This most likely reflects the vast difference in amount of cellulose production in secondary cell walls between trees and herbaceous plants such as Arabidopsis, the inflorescence stems of which contain a much lower proportion of secondary cell walls than tree stems.

Even though Arabidopsis usually does not usually deposit significant quantities of secondary xylem (Lev-Yadun, 1994), the secondary cell-wall related EgCesA promoters and the AtCesA8 promoter were highly active in xylogenic tissues of Arabidopsis. These findings imply that a basic set of tissue-specific cis-elements and associated transcription factors are conserved between these two highly diverged species. These promoter elements and proteins may therefore be part of an ancient, conserved network regulating gene expression during primary and secondary cell wall formation in higher plants.

The intron and exon boundaries within CesA genes are highly conserved among different plant genera, such as Arabidopsis, Populus and Oryza (Richmond & Somerville, 2000). Similarly, the proximal upstream regions of some of the Eucalyptus, Populus and Arabidopsis CesA genes exhibited remarkable structural conservation. For example, the 5′ UTRs of EgCesA4 and its orthologs, AtCesA3 and PtrCesA5, all contained a conserved intron (Fig. 1a). The presence of an intron in the 5′ UTR of a rice β-tubulin gene was associated with high levels of gene expression (Morello et al., 2002). The shared intron identified in EgCesA4 and its orthologs may also play a role in gene expression and should be tested further. EgCesA3 and its orthologs, AtCesA7 and PtrCesA2 (Fig. 1f), which are all highly expressed in secondary cell wall-forming tissues, all contained the shortest 5′ UTRs of the CesA genes in each species. It has been suggested in previous studies that short 5′ UTRs are associated with higher gene expression levels (Molina & Grotewold, 2005).

Shared and unique putative cis-regulatory elements in CesA promoters

It is important to note that our approach to detecting putative cis-regulatory motifs would yield motifs that are over-represented in the specific datasets that we constructed, relative to a background model. Some of these motifs may not be functional cis-regulatory elements. However, the value of our approach was that novel, conserved cis-regulatory elements would be detected that would normally not be identified by database searches. Conversely, our study was not designed to detect low abundance, functional motifs, which remain a target for future studies.

Three putative cis-elements (CRPE3, 4 and 5; Table 2) were shared among CesA and CesA co-expressed genes expressed in primary tissues, which may indicate a conserved role for some of these motifs in primary cell wall formation. Contrary to expectation, we did not detect motifs that were shared between the secondary CesA-related set (from co-expressed genes in Arabidopsis) and the secondary CesA set (from CesA genes in Arabidopsis, Poplar and Eucalyptus). This may be due to the fact that, while Eucalyptus and Populus trees both deposit large amounts of cellulose during normal development, this is not the case in Arabidopsis. Arabidopsis plants must be induced to form substantial amounts of secondary xylem and the induction places the plants under stress (Lev-Yadun, 1994; Ko et al., 2004). Some of the transcription factors involved in the regulation of the secondary cell wall-related Arabidopsis CesA genes may therefore be modulated by a stress response. Several of the motifs in the secondary CesA-related set were similar to stress responsive elements (e.g. CRPE58, 62 and 66; Table 3). The lack of shared putative cis-elements between the secondary CesA and secondary CesA co-expressed promoter set may also simply reflect the complexity of the transcriptional network leading to secondary cell wall production, or the existence of species-specific motifs that were not detected by the software programs used in this study.

Database-assisted analysis of the detected motifs enabled the annotation of some of the putative elements. Cis-regulatory elements are generally between 6 and 12 bp in length and even near-perfect matches exhibit low significance values in similarity searches. Some of the putative cis-regulatory sequences identified in this study showed significant similarity to previously identified vascular- (Kawaoka et al., 2000) and stem-specific (Hatton et al., 1995) elements (Tables 2, 3). While putative functions could be assigned to some of the motifs, many showed no similarity or only limited similarity to previously described elements in the PLACE database (Higo et al., 1999). These motifs are possibly novel elements that have yet to be functionally characterized.

The putative cis-regulatory elements are nonrandomly distributed in the CesA promoters

Mapping of the motif occurrences revealed a number of unique patterns within the orthologous CesA promoter sets. For example, CRPE11 (Fig. 3b, green) appeared to be present in clusters of two to five occurrences in several of the promoters in the primary CesA set. CRPE28 and CRPE6 also appeared to have a position-dependent association. In the region between 150 and 400 bp upstream (Fig. 3d) they were often found in pairs approx. 100 bp apart in the secondary CesA dataset. These two motifs showed similarity to two iron deficiency elements (IDE1 and IDE2, respectively, Table 2). IDE1 and IDE2 were identified in the promoters of S-adenosyl methionine synthetase (SAMS) genes which were found to play a central role in the iron deficiency response pathway (Kobayashi et al., 2003). SAMS is also a key enzyme in lignin biosynthesis and is expressed at high levels during secondary cell wall formation. These elements may therefore play a role in the secondary cell wall-associated expression of cellulose synthase genes.

In conclusion, we showed that CesA promoters from Eucalyptus trees are functional in Arabidopsis plants, despite the low amounts of sequence similarity between orthologous CesA promoters. This functional conservation is made possible by cis-elements that have been retained during evolution along with the corresponding transcription factors. It is evident that the shared cis-elements have diverged considerably at the sequence level, whilst retaining their regulatory functionality. This remarkable plasticity of the shared regulatory networks in higher plants will be beneficial to plant biotechnology applications, including the direct use of heterologous promoters to drive transgene expression, as well as the construction of synthetic promoters with applicability in a wide range of species. This study provides a valuable foundation for the functional characterization of CesA-related cis-regulatory sequences and ultimate elucidation of the transcriptional networks leading to cellulose production in primary and secondary cell walls of plants.

Acknowledgements

The authors would like to thank Joanne Bradfield for her assistance with the isolation of the EgCesA2 and EgCesA7 promoter regions. This work was supported with funding provided by Mondi South Africa, through the Wood and Fibre Molecular Genetics Programme, the Technology and Human Resources for Industry Programme (THRIP) and the National Research Foundation of South Africa (NRF).

Supporting Information

Table S1 Forward and reverse primers used for the end-to-end amplification of the six EgCesA promoter regions from E. grandis genomic DNA

Table S2 NCBI accession numbers and names of the Eucalyptus, Populus and Arabidopsis cellulose synthase genes whose promoters were included in the study

Table S3 TAIR locus identifiers of the Arabidopsis genes whose promoters were used to construct the two CesA-related datasets used in the motif analysis

Table S4 Motifs identified in the primary CesA set by at least two of the three software programs (Weeder, POCO and MotifSampler)

Table S5 Motifs identified in the secondary CesA set by at least two of the three software programs (Weeder, POCO and MotifSampler)

Table S6 Motifs identified in the primary CesA-related set by at least two of the three software programs (Weeder, POCO and MotifSampler)

Table S7 Motifs identified in the secondary CesA-related set by at least two of the three software programs (Weeder, POCO and MotifSampler)

Please note: Blackwell Publishing are not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the journal at New Phytologist Central Office.