Chromosome countdown: 22...21...20...

In the December 20/27 Nature, Deloukas et al. from The Wellcome Trust Sanger Institute, UK, report the complete (over 99.99% accuracy and over 99% coverage) sequence of the first metacentric human chromosome, chromosome 20 (Nature 2001, 414:865-871). The authors state that analysis of the genomic sequence has benefited from recent progress in cDNA and EST collection datasets, and comparison with the (incomplete) genomic sequences of mouse and puffer fish. The chromosome 20 euchromatic DNA is covered by six contigs (59.5 megabases), with gaps representing less than 320 kb. The sequence contains 727 genes and 168 pseudogenes, of which 335 encode 'known' proteins. Most of these genes have a complete open reading frame and 5'/3' untranslated regions. The G+C content is 44.1% and around 67% of genes are associated with 5' CpG islands. The gene density is 12.18 per Mb (intermediate between the figures obtained for chromosomes 21 and 22) and allowed the authors to estimate a total gene count of 31,500 for the human genome. They found evidence for alternative splicing for 29% of the annotated genes. Three quarters of the predicted proteins contain known protein domains (that is, they match with InterProScan entries). Gene clusters include genes containing the immunoglobulin and Ig-like domains, cystatins, WAP-type domains and the semenogelins. The authors placed over 26,000 single-nucleotide polymorphisms (SNPs) on chromosome 20. Chromosome 20 contains genes linked to a number of genetic diseases including Creutzfeldt-Jakob disease, severe combined immunodeficiency and breast and prostate cancer. Comparison with the draft sequence of chromosome 20 published earlier this year reveals a number of differences, emphasising the importance of the 'finishing' process.