Monthly Archives: April 2014

The Contributors

Rachel Glover, FERA.

The material

In order to identify sequences potentially originating from the
mitochondrial genome of H. pseudoalbidus we downloaded the 248
fully sequenced ascomycete mitochondrial genomes from
Genbank and used these sequences as a BLAST database to screen the
genomic contigs for potential mitochondrial origin.

The result

Fifty-seven contigs
were identified with significant similarity to ascomycete mitochondrial
sequences. Further examination of these 57 contigs showed that many
contigs were identical but in reverse complement or extending by a few
hundred base pairs. These contigs were collapsed to form a dataset of 45
contigs ranging in length from 109-14,731bp and GC-contents ranging from
9.2-45.9 % (Figure 1). Most of the contigs \textgreater{}5kb fall into a GC content
range of 30-40 %, typical of AT-rich mitochondrial sequences. It may be
that the AT rich repeat islands discussed above are mitochondrial in
origin as the mitochondrial genome will be more prevalent in the
sequence dataset this would explain the increase in abundance of those
sequences

The total length of the 45 mitochondrial contigs is
156,026bp with no significant overlap. If this preliminary estimate is accurate \emph{H.pseudoalbidus} would have the largest
mitochondrial genome sequenced from the ascomycetes so far (see Figure 2), although we expect the size to reduce with further work.

Interpretation

A number of factors have prevented the construction of a finished
mitochondrial genome at this time. Firstly, the potential mitochondrial
contigs were identified based upon similarity based searches against
current ascomycete mitochondrial genomes. The similarity based approach
to finding mitochondrial sequences within a nuclear genome sequencing
project may have misidentified some of these contigs as mitochondrial
when in fact they are nuclear integrations of portions of the true
mitochondrial genome (NUMTs). This is likely to have artificially
inflated our estimate of the size of the H. pseudoalbidus mitochondrial
genome. Annotation of the potential mitochondrial contigs is in progress
and there are early indications of a very large number of introns
(intronic ORFs) present in the mitochondrial genome of H. pseudoalbidus.
The second complicating factor in attempting to assemble the
mitochondrial genome at this time is the large number of AT repeats
present in the sequences we have identified as being mitochondrial in
origin. The repeats are likely to be collapsed and appear to be at the
ends of the contigs we have identified, preventing further assembly
without additional sequencing.

The material

Background information

In filamentous plant pathogens such as the late blight oomycete pathogen Phytophthora infestans, a repeat-driven expansion has created repeat and transposable element (TE) rich, gene-sparse regions that are distinct from the gene-dense conserved regions, known as a two-speed genome architecture. Determining the distance of a gene to its closest coding gene neighbours, (designated flanking intergenic regions, FIRs), can be used to determine whether a gene resides in a gene-dense or gene-sparse environment. Given that genes associated with pathogenicity tend to have long FIRs in pathogen genomes, genome architecture could be used to identify new candidate pathogenicity genes.

The analysis

To investigate whether a similar organisation occurs in the genome of H. pseudoalbidus we firstly identified candidate effector genes in the gene annotations (http://oadb.tsl.ac.uk/?m=20130910). In order to determine whether genes encoding secreted proteins are in gene sparse or dense regions of the genome we modified the de novo gene calls using RNA-seq data to extend based on overlaps with transcripts, to create the file extended_genes.gff by aligning the RNAseq reads from KW1 against the KW1 assembly, using BWA. For each gene model in the TGAC gene predictions that was within 100nt of another gene we extracted reads on the same strand that fell within -1000nt of the start or 1000nt of the end. With these reads, starting with the start and end of the gene we followed read overlaps as far as possible, until reads no longer overlapped. The most distal read then counted as the new gene start/end.

The FIR distribution for genes in the H.pseudoalbidus genome can be seen below and is indicative of a single speed genome, with genes encoding secreted proteins dispersed both in gene-sparse and gene-dense regions of the genome.

Contributors

Christine Sambles and David Studholme. University of Exeter, Devon.

Introduction

In order to identify fungal protein-coding genes associated with Fraxinus:Hymenoschyphusin planta interactions, we took an orthologue clustering approach. By identifying fungal transcripts that are present in four samples taken from infected ash and removing transcripts that are also present in the KW1 isolate could reveal some infection-related transcripts from H. pseudoalbidus. Additionally, F. excelsior transcripts present in the infected material and absent from F. excelsior with no signs of infection could identify transcripts involved in the plants response to infection by H. pseudoalbidus.

There was a core set of 3,118 protein clusters from detectable transcripts. A set of 113 protein clusters was identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton) and 33 only identified in KW1, a H. pseudoalbidus isolate. These will be referred to as the ‘in planta’ and ‘ex planta’ groups respectively.

The 113 protein clusters found only in H. pseudoalbidus infected F. excelsior (in planta) contained a total of 565 transcripts (459 excluding isoforms). We annotated the transcript sequences based on results of BLASTX searches. Additionally the GO, EC, KEGG, PFAM and CAZy (Carbohydrate-Active enzymes) databases were used to annotate the full set of 565 transcripts.

GO analysis revealed a reduction of growth-related and an increase of cell differentiation and proliferation proteins in infected material (Fig 2).

Figure 2: Gene Ontology (GO) analysis of the the pan-proteome (KW1, AT1, AT2, Upton, Holt) compared to in planta proteins. The in planta proteins were translated from Helotiales-binned transcripts (MEGAN) and were identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton). The pan-proteome proteins were also translated from Helotiales-binned transcripts (MEGAN) and include the isolate, KW1.

PFAM and CAZy analysis of the 565 transcripts of the pan-proteome resulted in 88 PFAM domains/families and the following CAZy families:

Protein of unknown function, a putative transmembrane protein from bacteria. It is likely to be conserved between Mycobacterium species (Pfam: DUF2029, PF09594) & PAP2 superfamily (Pfam: PAP2_3, PF14378)

BLASTX hits from the in planta transcripts included putative CFEM domain-containing protein (Marssonina brunnea) and Galactose mutarotase-like protein (Glarea lozoyensis). The Galactose mutarotase-like protein is of interest as it is also similar to rhamnogalacturonate lyase found in Aspergillus spp. and is known to degrade plant cell walls by cleaving the pectin backbone (de Vries and Visser 2001). Some CFEM-containing proteins are proposed to have important roles in fungal pathogenesis (Kulkarni, Kelkar et al. 2003).

Comparisons of Pfam domain content among samples

PFAM domains and families in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were identified using the hmmpfam wrapper script, Pfam scan. These were compared to the PFAM annotation of the ‘in planta’ group to identify over-representation of specific domains within this group. The domains and families in which >80% annotations were present in the ‘in planta’ group when compared to the ‘pan-proteome’ are shown in Table 1.

Table 1: Pfam domains and families in which >80% ‘pan-proteome’ annotations were present in the ‘in planta’ group (http://pfam.sanger.ac.uk/).

Domain/Family

Name

Pfam accession

ATP12

ATP12 chaperone protein

PF07542

BOP1NT

BOP1NT (NUC169) domain

PF08145

iPGM_N

BPG-independent PGAM N-terminus

PF06415

CDC37_M

Cdc37 Hsp90 binding domain

PF08565

CDC37_N

Cdc37 N terminal kinase binding domain

PF03234

CDC37_C

Cdc37 C terminal domain

PF08564

Chalcone

Chalcone-flavanone isomerase

PF02431

Copper-bind

Copper binding proteins plastocyanin/azurin family

PF00127

Sdh5

Flavinator of succinate dehydrogenase

PF03937

HD_3

HD domain

PF13023

Hpt

Hpt domain

PF01627

Metalloenzyme

Metalloenzyme superfamily

PF01676

CENP-I

Mis6

PF07778

Myosin_tail_1

Myosin tail

PF01576

TRM

N2 N2-dimethylguanosine tRNA methyltransferase

PF02005

Es2

Nuclear protein Es2

PF09751

Tom37

Outer mitochondrial membrane transport complex protein

PF10568

PAP2_3

PAP2 superfamily

PF14378

PMC2NT

PMC2NT (NUC016) domain

PF08066

Porphobil_deam

Porphobilinogen deaminase dipyromethane cofactor binding domain

PF01379

Porphobil_deam(C)

Porphobilinogen deaminase C-terminal domain

PF03900

DUF2012

Protein of unknown function

PF09430

DUF775

Protein of unknown function

PF05603

Prp31_C

Prp31 C terminal domain

PF09785

Ribosomal_L32p

Ribosomal L32p protein family

PF01783

Several of the Pfam hits struck us as interesting; these are described below. The pairs of numbers in brackets are the number found within the in planta group / number found in entire ‘pan-proteome’:

Porphobil_deam and Porphobil_deamC (6/6) were found in two AT1 isoforms, AT2, two Holt isoforms and Upton. There were no peptides with this domain in the Helotiales binned KW1 proteome. Heme-biosynthetic porphobilinogen deaminase protects Aspergillus nidulans from nitrosative stress. In A. nidulans, a novel NO-tolerant (nitric oxide-tolerant) protein PBG-D (the heme biosynthesis enzyme porphobilinogen deaminase) modulates the reduction of environmental NO and nitrite by flavohemoglobin (FHB, encoded by fhbA and fhbB)) and nitrite reductase (NiR, encoded by niiA) (Zhou, Narukami et al. 2012). NO is part of the plant hypersensitive response, a localized programmed cell death and conﬁnes pathogen to site of attempted infection (Mur, Carver et al. 2006).

Although below the threshold of 80%, 35.71% (5/14) of the CFEM domains identified in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were present in the ‘in planta’ group and none were present in the ‘ex planta’ group. The CFEM domains were distributed across 4 clusters, only one of which is not present in KW1:

ClusterID: Clustered protein present in:

HELO2454: AT1, AT2, HOLT, UPTON

HELO4337: AT1, AT2, HOLT, UPTON, KW1

HELO5213: AT1, HOLT, UPTON, KW1

HELO5952: AT2, UPTON, KW1

Fig 2: Phylogenetic tree of H. pseudoalbidus sequences from four OrthoMCL clusters where at least one sequence in the cluster contains a CFEM domain (Pfam:PF05730). The names of full-length proteins are shown in black; in grey are names of shorter length proteins from incomplete transcript assembly that lack a CFEM domain but that cluster with CFEM domain sequences due to sequence similarity and inferred orthology. Orthologue clustering was performed on all translated transcripts binned to the Helotiales using MEGAN from the one H. pseudoalbidus isolate (KW1) and all four H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton).

The 33 clusters (representing 72 peptides) in the ex planta group which were only identified in the isolate KW1 were annotated with PFAM as previously described. This resulted in identification of 17 Pfam domains/families (Table 2).

Table 2: Pfam domains/families identified in the ex planta group

Domain/Family

Name

Pfam accession

COX1

Cytochrome C and Quinol oxidase polypeptide I

PF00115

DASH_Spc34

DASH complex subunit Spc34

PF08657

Pentapeptide_4

Pentapeptide repeats

PF13599

Vac7

Vacuolar segregation subunit 7 P

PF12751

DHQ_synthase

3-dehydroquinate synthase

PF01761

LtrA

Bacterial low temperature requirement A protein

PF06772

FSH1

Serine hydrolase

PF03959

Tyrosinase

Common central domain of tyrosinase

PF00264

Glyco_hydro_47

Glycosyl hydrolase family 47

PF01532

DUF202

Domain of unknown function

PF02656

SET

SET domain

PF00856

Abhydrolase_1

alpha/beta hydrolase fold

PF00561

adh_short_C2

Enoyl-(Acyl carrier protein) reductase

PF13561

Glyco_hydro_3

Glycosyl hydrolase family 3 N terminal domain

PF00933

ADH_zinc_N

Zinc-binding dehydrogenase

PF00107

AAA

ATPase family associated with various cellular activities

PF00004

adh_short

short chain dehydrogenase

PF00106

This low number of peptides not identified in any of the H. pseudoalbidus infected ash samples limits the ability to perform any comparative analysis.

Conclusions

Proteins putatively involved in plant-pathogen interactions have been identified from groups of translated transcripts exclusively found in planta and were not identified in isolate KW1. They included a copper binding protein within the plastocyanin/azurin family, porphobilinogen deaminase, a CFEM domain-containing protein and a Galactose mutarotase-like protein.