Optimal Homology probes

The normal function of prion protein is not currently understood. By comparing the sequence of prion protein to the existing databases of known sequences, homologous proteins of known function in other species might be found. In organisms such as yeast, the genome is entirely sequenced and there are very rapid methods for identifying protein function.

These homologous proteins might have a [slightly] different biological role in their host but might have binding sites for the same or similar small ligands. And this is all that is needed to construct therapeutic substrate analogues that could covalently and irreversibly inactivate the rogue form of prion protein.

Now homology searches for the entire prion amino acid sequence do not return matches other than prion proteins themselves. The reason for this is partly that the prion gene is single copy and without 'close' members of its superfamily. Also, homology search engines such as BLASTp are poor at inserting gaps -- the prion gene has numerous small gaps already in the mammals.

The octapeptide repeat has a variable expansion length. This region also amplifies slightly different palindromes in older lines. Its high glycine content further misleads BLASTp into high priority returned mismatches with unrelated proteins with poly-glycine stretches. The repeat generating region is a better candidate for searches than the repeats themselves.

It also makes no sense to include the N-terminal signal peptide, which is found in many unrelated proteins and quite variable in length and sequence, not requiring a canonical sequence. (Note however that the basic residues C-terminal to the cleavage are strongly conserved.)

For the same reasons, the GPI sequence should not be used in deep homology probes: it weakens the signal used by the homology search engine. There are separate compilations of all known GPI proteins that can be accessed by a Medline search and examined individually.

Since we are looking for distant homology (to nematode, fruit fly, zebrafish, yeast, or bacteria) or weak paralogous homology within mammals, the search should avoid using regions already hyper-variable within the placental mammals, such as the post-helix H3 stretch. Special emphasis should be given to invariant (or quasi-invariant) residues, especially those that are conserved in marsupial and chicken. In cases where placental mammal disagrees with chicken and marsupial, but a common placental substitution does agree, that variant should rule in the probe. Residues that are not quite invariant may only exhibit conservative subsitutions (e.g., valine to leucine) -- information that yields partial agreement in comparisons.

Proteins are commonly composed of domains. In some cases, domains are seemingly assembled from disparate sources by recombination and transposition; however, here the prion protein consists of a single exon. The search should still reflect domains identified in the NMR structure as well as apparent exposed hinges inferred from hydrolytic cleavage by proteases. Anchors such as the cysteines and substitued asparagines are also of value.

In short, a series of probes that span single or adjacent well-conserved structural features offer the optimal possibilities for finding homologies. Candidates can then be further scrutinized by more subtle testing. On this basis, the best probes for prion protein are: