1
Searching for functional regions (coding or non-coding) in mammalian genomes Organization of the human genome Human genome project: present status Human sequence data in GenBank/EMBL Prediction of functional elements by computer analysis of genomic sequences State of the art Success and pitfalls of different approaches Prediction of function by homology Orthology/paralogy

14
Expressed Sequence Tags (ESTs) Inventory of all mRNAs expressed by an organism, in different tissues, development stages, pathologies, … Single pass sequences: high error rate (>1%), partial mRNA sequences Usually derived from poly-dT-primed cDNA -> bad coverage of 5' regions of long mRNAs 60-80% of human genes represented in public EST database, but only 25-50% of the total coding part of the genome Homo sapiens 2,461,893 Mus musculus (mouse) 1,661,949 Rattus sp. (rat) 188,736 Number of ESTs (Sep. 2000)

31
Phylogenetic footprinting Advantages Works for all kinds of functional elements (transcribed or not, coding or not) as far as the information is in the primary sequence Does not require any a priori knowledge of the functional elements Limits Absence of evolutionary conservation does not mean absence of function No efficient method to detect unknown conserved secondary structure in RNA Function, but what function ? Depends on the sequencing status of other genomes Human, mouse, fugu, C. elegans, drosophila, yeast, A. thaliana Number of sequences to compare : > 200 Myrs of evolution Mammals/birds: 310 Myrs Human + mouse + bovine : 240 Myrs