5
Gene Prediction W hen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers. Identify the words W hen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers. 526th Feb 2014

6
Functional Annotation W hen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers. nat·u·ral·ist [nach-er-uh-list, nach-ruh-] noun 1. a person who studies or is an expert in natural history, especially a zoologist or botanist. 2. an adherent of naturalism in literature or art. Origin: 1580–90; natural + -ist Origin of Species, The noun ( On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life ) a treatise (1859) by Charles Darwin setting forth his theory of evolution. Identify the function (i.e., meaning) of each word DATABASES PROFILES 626th Feb 2014

7
Comparative Genomics W hen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers. 726th Feb 2014 W hen on board RMS Titanic, as painter, I was much struck with certain facts in the distribution of the inhabitants of United Kingdom, and in the socioeconomical relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of capitalism- that mystery of mysteries, as it has been called by one of our greatest philosophers.

8
THE GRAVITY OF THE ANNOTATION PROCESS Not just Newtonian 826th Feb 2014

10
Function? What is it? To a cell biologist function might refer to the network of interactions in which the protein participates or to the location to a certain cellular compartment. To a biochemist, function refers to the metabolic process in which a protein is involved or to the reaction catalyzed by an enzyme. 1026th Feb 2014

13
Domain/Motif Domain: A discrete structural unit that is assumed to fold independently of the rest of the protein and to have its own function. ~20-100 aa Motif: Are short, conserved regions and frequently are the most conserved regions of domains. Motifs are critical for the domain to function. 26th Feb 201413

15
How Gene Performs Function? Operon Operon: Several genes with related functions that are regulated together, because one piece of mRNA codes for several related proteins. Polycistronic mRNA, mRNA coding for more than one polypeptide, is found only in prokaryotes 26th Feb 201415

29
Criteria for selecting methods 1.Currently being maintained 2.Applicable to Prokaryotic sequences 3.Could be installed locally (support batch jobs if GUI) OR Could be included in a pipeline i.e., have a command-line interface 2926th Feb 2014

30
Gene naming You need to have a clear logic and support for assigning names to the predicted proteins A generally accepted scheme is as follows: – High confidence matches – function and annotation can be transferred – Multiple high confidence matches – assign a less specific name e.g. ABC transporter – Low confidence matches – assign function as putative – Match to a hypothetical protein – conserved hypothetical protein – No match in the database – hypothetical protein How high is high? Depends on your data. 26th Feb 201430

34
“Perutz et al. showed in 1960 that myoglobin and hemoglobin, the first two protein structures to be solved at atomic resolution using X-ray crystallography, have similar structures even though their sequences differ.” 26th Feb 201434

35
Pros and Cons: There are no free lunches! Homology Useful but different from “same” function – Simply implies common ancestry Punta, M., & Ofran, Y. (2008). The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS computational biology, 4(10), e1000160.

36
Pros and Cons: There are no free lunches! 3626th Feb 2014 Punta, M., & Ofran, Y. (2008). The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS computational biology, 4(10), e1000160.

37
Pros and Cons: There are no free lunches! Again: Quality of prediction is as good as the quality of annotation of the database Eukaryotic function predictor can not be used for Prokaryotes and vice versa Building pan-genomes is a good strategy for finding more confident matches 3726th Feb 2014

41
Comparative Genomics Biological questions of general interest: – Are there are rearrangements? – Is the region(s) of interest syntenic across species? – Are their gene gain/loss event leading to specific trait? – What organisms are more similar? What are most distant? – What factors confer virulence to the genome? – In our case: capsule switching? What, why and how? 26th Feb 201441

45
Comparative Genomics You are going to hear more about your specific goals next week Remember: The focus here is not about the tools but (1) identification of the biological question, (2) your approach to answering the question and (3) your results with interpretation 26th Feb 201445

46
Databases As before – there are number of sequence databases available – You need to decide what subset of that database should you taking into consideration – For e.g.: what organism/serogroup/sequence type should your database be focused on? If we are also looking for virulence factors - VFDB If we are interested in pathways – KEGG, Pathway Tools 26th Feb 201446

48
Phylogenetic Analysis There are a number of ways you can compare organisms/genomes: – 16S rRNA tree – MLST based methods – ANI based methods All three can be visualized as a tree to assess the relatedness between the organisms ANI has been shown to correlate well with DDH by Konstantinidis et al More traditional Konstantinidis, K. T., Ramette, A., & Tiedje, J. M. (2006). The bacterial species definition in the genomic era. Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1475), 1929-1940. Goris, J., Konstantinidis, K. T., Klappenbach, J. A., Coenye, T., Vandamme, P., & Tiedje, J. M. (2007). DNA–DNA hybridization values and their relationship to whole-genome sequence similarities. International journal of systematic and evolutionary microbiology, 57(1), 81-91. 26th Feb 201448

49
Different phenotype, same evolutionary lineages Phenotypic concordance need not support same ancestral lineage At times it has been observed that species tend to gain certain set of mutations in same or different gene(s) which leads to the same phenotype Acquiring antibiotic resistance is one such example The investigation of such cases depends on a case-by-case manner with underlying reasons varying from SNPs, gene gain/loss, indels, plasmid uptake etc 26th Feb 201449