Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training,
learning paths, books, tutorials, and more.

Finding Motifs

One of the most common things we do in bioinformatics is to look for
motifs
, short segments of DNA or protein that are of particular interest. They
may be regulatory elements of DNA or short stretches of protein that are known to be
conserved across many species. (The PROSITE web site at http://www.expasy.ch/prosite/ has extensive information about
protein motifs.)

The motifs you look for in biological sequences are usually not one specific
sequence. They may have several variants—for example, positions in which it doesn't
matter which base or residue is present. They may have variant lengths as well. They
can often be represented as regular expressions, which you'll see more of in the
discussion following Example 5-3; in
Chapter 9; and elsewhere in the
book.

Perl has a handy set of features for finding things in
strings. This, as much as anything, has made it a popular language for
bioinformatics. Example 5-3 introduces
this string-searching capability; it does something genuinely useful, and similar
programs are used all the time in biology research. It does the following:

Reads in protein sequence data from a file

Puts all the sequence data into one string for easy searching

Looks for motifs the user types in at the keyboard

Example 5-3. Searching for motifs

#!/usr/bin/perl -w # Searching for motifs # Ask the user for the filename of the file containing # the protein sequence data, and collect it from the keyboard print "Please type the filename of the protein sequence ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training,
learning paths, books, interactive tutorials, and more.