Abstract

Not I linking clones contain sequences flanking Not I recognition sites and were previously shown to be tightly associated with CpG islands and genes. To directly assess the value of Not I clones in genome research, high density grids with 50 000 Not I linking clones originating from six representative Not I linking libraries were constructed. Altogether, these libraries contained nearly 100 times the total number of Not I sites in the human genome. A total of 3437 sequences flanking Not I sites were generated. Analysis of 3265 unique sequences demonstrated that 51% of the clones displayed significant protein similarity to SWISSPROT and TREMBL database proteins based on MSPcrunch filtering with stringent parameters. Of the 3265 sequences, 1868 (57.2%) were new sequences, not present in the EMBL and EST databases (similarity < or =90%). Among these new sequences, 795 (24.3%) showed similarity to known proteins and 712 (21.8%) displayed an identity of >75% at the nucleotide level to sequences from EMBL or EST databases. The remaining 361 (11.1%) sequences were completely new, i.e. <75% identical. The work also showed tight, specific association of Not I sites with the first exon and suggest that the so-called 3' ESTs can actually be generated from 5'-ends of genes that contain Not I sites in their first exon.