Map maps a DNA sequence
and displays both strands of
the mapped sequence with restriction
enzyme
cut points above the sequence
and protein translations below.
Map can also create a
peptide map of an
amino acid sequence.

Map displays a sequence that
is being assembled or analyzed
intensively. Map asks you
to select the
enzymes whose restriction sites should
be marked individually by typing
their names. If you
do not
answer this question, Map selects
a representative isoschizomer from all
of the commercially available
enzymes. You can choose
to have your sequence translated
in any or all of
the six possible translation
frames. You can also
choose to have only the
open reading frames translated.

After running Map, you may
create a new sequence file
with the protein sequence from
any frame of
DNA translation by using the
ExtractPeptide program with the Map
output file.

Here is a session using
Map to display a portion
of gamma.seq, along with a
restriction map and
six-frame protein translation:

% map
(Linear) MAP of what sequence ? gamma.seq
Begin (* 1 *) ? 2161
End (* 11375 *) ? 2600
Select the enzymes: Type nothing or "*" to get all enzymes. Type "?"
for help on which enzymes are available and how to select them.

Map accepts a single nucleotide
or protein sequence as input.
The function of Map
depends on whether
your input sequence(s) are protein
or nucleotide. Programs determine
the type of a sequence
by the
presence of either Type: N
or Type: P on the
last line of the text
heading just above the sequence.
If
your sequence(s) are not the
correct type, see Appendix VI
for information on how to
change or set
the type of a sequence.

MapSort, PlasmidMap, and MapPlot display
restriction maps in other formats.
ExtractPeptide
extracts the protein sequence from
any translation frame in the
Map output file and puts
it into a new
sequence file. FindPatterns searches
for short patterns like enzyme
recognition sites in one or
more
sequences. PeptideMap creates a
peptide map of an amino
acid sequence. You can
use either Map or
PeptideMap with protein sequence input
and obtain identical results.

Map does not treat your
sequence as circular unless you
use -CIRcular.

The enzymes you name must
be in the enzyme data
file or you get an
error message. You can
have
your system manager change the
public enzyme data file to
contain the enzymes most useful
to your
group, or you can maintain
a private copy for your
own use. (See the
LOCAL DATA FILES topic below
for more information.)

This program normally requires that
a sequence pattern be a
subset of the enzyme recognition
site. If
the recognition pattern in the
enzyme data file were GCRGC,
then the pattern GCAGC in
your
sequence would be found, since
A is within the set
of bases defined by R
(see Appendix III). If the
pattern in the enzyme data
file were GCAGC, then a
GCRGC in your sequence would
not be
recognized. If your sequence
is very ambiguous, as it
might be if it were
a backtranslated sequence,
then it may be better
to use -ALL to do
an overlap search. The
overlap search would consider an
R in
your sequence to match an
A in the recognition site.

With -PERFect, the program looks
for a perfect symbol match
between your sequence and the
recognition pattern -- GCRGC in
the recognition pattern would only
match a GCRGC in the
sequence.

All searches are case insensitive
(upper- or lowercase) for the
letters in either the sequence
or the
enzyme recognition site.

As in almost all sequence
displays the 5'->3' direction of
the top strand is from
left to right.
Map aligns each enzyme's name
so that the name ends
over the 3' end of
the fragment that
continues to the left.
If you use -BOTtom, Map
aligns the name to end
over the 5'-most
nucleotide of the reverse strand
fragment that continues to the
left.

Collisions

If more than one enzyme
cuts at the same position,
Map sorts the set of
enzymes that cut at the
position alphabetically and stacks them
up so that each enzyme
name ends over the same
position. If enzymes that
cut to the left are
in the way of the
display, Map puts the names
further up and uses a
line of '|' characters to connect
the name to the cut
position.

Potential Sites

When you search for potential
restriction sites with either -MISmatch
or -SILent, Map
differentiates the real sites from
the potential sites by capitalizing
the enzyme's name at the
real
sites.

The program presents you with
an enzyme selection prompt that
lets you enter enzymes individually
or
collectively. To get help
with selecting enzymes, type a
? at the enzyme
prompt. Here is what
you
see:

Select enzymes:
Type "*" to select all enzymes.
Type "**" to select all enzymes including isoschizomers.
Type individual names like "AluI" to select specific enzymes.
Type "?" to see this message and all available enzymes.
Type "??" to see the available enzymes AND their recognition sites.
Type "?A*" to see what enzymes start with "A."
Type "A*" to select all enzymes starting with "A."
Type parts of names like "Al*" to select all enzymes starting with "AL."
Type "~A*" to unselect all selected enzymes starting with "A."
Type "/*" to see what enzymes you have selected so far.
Type "#" to select no enzymes at all.
Press <Return> after each selection.
Press <Return> and nothing else to end your selections.
Spaces are allowed; upper and lower case are equivalent.

We maintain our enzyme files
with a semicolon (;) character
in front of all but
one member of a family
of isoschizomers. (Isoschizomers are
restriction endonucleases with the same
recognition site.) The
isoschizomers beginning with a semicolon
are normally not displayed by
our mapping programs unless
you specifically select them by
name or type "**" instead
of "*" at the enzyme
prompt.

The translation menu allows several
responses. You can name
the frames of interest individually
with
a response like abcf.
You can use t or
s to mean the three
forward or all six possible
translation
frames. You can make
all of the characters in
your response uppercase to get
three-letter instead of
one-letter amino acid symbols in
the translation. You can
add o to your response
to get translation
only between potential start codons
and stop codons (o by
itself gives open reading frame
translation of
all six translation frames).

You can use an expression
like -MENu=abcf to choose translation
frames a, b, c, and
f from the
command line.

You can select translation for
open reading frames only.
All of the frames are
treated as open at the
5'
end of each strand; these
pseudo-open reading frames run to
the first stop codon in
that frame (see the
discussion of translation tables in
Appendix VII). Thereafter, reading is
turned on at each potential
start codon and runs to
the next stop codon.
You can suppress the display
of short open reading frames
with -OPEn=20, for example, which
would restrict the display to
frames coding for at least
20 amino
acids.

Open reading frames are determined
from the beginning and ending
of the sequence in the
file--not
from just the range you
have chosen. The potential
start codons and stop codons
are defined in the
data file translate.txt.

To assist scientists doing site-directed
mutagenesis, this program searches for
places in your sequence
where a restriction enzyme recognition
site occurs with one or
more mismatches. Use -MISmatch=1
to
identify positions where recognition could
occur with one or fewer
mismatches.

Use -SILent to find the
places in your sequence where
a restriction site could be
introduced without
changing the translation. Read
more about using -SILent under
the PARAMETER REFERENCE
topic below.

FindPatterns, Map, MapSort, MapPlot, and
Motifs all let you search
with ambiguous expressions that
match many different sequences.
The expressions can include any
legal GCG sequence character (see
Appendix III). The expressions can
also include several non-sequence characters,
which are used to
specify OR matching, NOT matching,
begin and end constraints, and
repeat counts. For instance,
the
expression TAATA(N){20,30}ATG means TAATA, followed
by 20 to 30 of
any base, followed by ATG.
Following is an explanation of
the syntax for pattern specification.

Implied Sets and Repeat Counts

Parentheses () enclose one or
more symbols that can be
repeated some number of times.
Braces
{} enclose numbers that tell
how many times the symbols
within the preceding parentheses must
be found.

Sometimes, you can leave out
part of an expression.
If braces appear without preceding
parentheses, the numbers in the
braces define the number of
repeats for the immediately
preceding symbol. One or
both of the numbers within
the braces may be missing.
For instance,
both the pattern GATG{2,}A and
the pattern GATG{2}A mean GAT,
followed by G repeated from
2 to 350,000 times, followed
by A; the pattern GATG{}A
means GAT, followed by G
repeated from
0 to 350,000 times, followed
by A; the pattern GAT(TG){,2}A
means GAT, followed by TG
repeated from 0 to 2
times, followed by A; the
pattern GAT(TG){2,2}A means GAT, followed
by
TG repeated exactly 2 times,
followed by A. (If
the pattern in the parentheses
is an OR
expression (see below), it cannot
be repeated more than 2,000
times.)

OR Matching

If you are searching nucleic
acids, the ambiguity symbols defined
in Appendix III let you define
any combination of G, A,
T, or C. If
you are searching proteins, you
can specify any of several
symbol choices by enclosing the
different choices in parentheses and
separating the choices with
commas. For instance, RGF(Q,A)S
means RGF followed by either
Q or A followed by
S. The
length of each choice need
not be the same, and
there can be up to
31 different choices within
each set of parentheses.
The pattern GAT(TG,T,G){1,4}A means GAT
followed by any
combination of TG, T, or
G from 1 to 4
times followed by A.
The sequence GATTGGA matches
this pattern. There can
be several parentheses in a
pattern, but parentheses cannot be
nested.

NOT Matching

The pattern GC~CAT means GC,
followed by any symbol except
C, followed by AT.
The pattern
GC~(A,T)CC means GC, followed by
any symbol except A or
T, followed by CC.

Begin and End Constraints

The pattern <GACCAT can only
be found if it occurs
at the beginning of the
sequence range
being searched. Likewise, the
pattern GACCAT> would only be
found if it occurs at
the end of
the sequence range.

All parameters for this program
may be added to the
command line. Use -CHEck
to view the summary
below and to specify parameters
before the program executes.
In the summary below, the
capitalized
letters in the parameter names
are the letters that you
must type in order to
use the parameter.
Square brackets ([ and ])
enclose parameter values that are
optional. For more information,
see "Using
Program Parameters" in Chapter 3,
Using Programs in the User's
Guide.

We are grateful to Frank
Manion for suggestions and for
code used in the revision
of Map for version
9.0. The vertical enzyme
output format of Map was
designed by John Schroeder and
Frederick
Blattner (NAR 10; 69-84 (1982),
Figure 1). Map was
written for the first release
of the Wisconsin
Package(TM) by Paul Haeberli and
John Devereux.

The files described below supply
auxiliary data to this program.
The program automatically reads
them
from a public data directory
unless you either 1) have
a data file with exactly
the same name in your
current working directory; or 2)
name a file on the
command line with an expression
like
-DATa1=myfile.dat. For more information
see Chapter 4, Using Data
Files in the User's Guide.

This program reads the public
or local version of enzyme.dat
to get the enzyme names,
recognition sites,
cut positions, and overhangs.
You can use mapping programs
to search for any sequence
pattern by
adding the pattern to the
enzyme data file. If
you use the command-line parameter
-APPend, this
program appends the enzyme data
file to the output file.
(See Appendix VII for more
information about
enzyme data files.)

If Map finds Type: P on
the dividing line in the
sequence file, it reads proteolytic
cleavage data in the
local data file proenzyme.dat.

The translation of codons to
amino acids, the identification of
potential start codons and stop
codons, and
the mappings of one-letter to
three-letter amino acid codes are
all defined in a translation
table in the file
translate.txt. If the standard
genetic code does not apply
to your sequence, you can
provide a modified
version of this file in
your working directory or name
an alternative file on the
command line with an
expression like -TRANSlate=mycode.txt. Translation
tables are discussed in more
detail in
Appendix VII. If you use
the command line parameters -APPend,
this program appends the enzyme
data file to the output
file. If you have
provided your own translation scheme
that file is also appended.

You can set the parameters
listed below from the command
line. For more information,
see "Using
Program Parameters" in Chapter 3,
Using Programs in the User's
Guide.

-ENZymes=*[,...]

specifies the restriction enzymes whose
recognition sites you want to
search. If you search
for
several different enzymes, separate their
names with commas. -ENZymes=*
selects all
enzymes, -ENZymes=** selects all enzymes,
including isoschizomers, and -ENZymes=Al*
selects all enzymes whose names
start with Al.

-MENu=t

specifies which nucleotide reading frames
are translated into protein sequences
in the output
file. Specify t for
three forward frames, s for
all six frames, o for
open frames only, or n
for no
protein translation. You can
also specify one of the
letters a through f for
any one of the six
possible reading frames.

-TRANSlate=filename.txt

Usually, translation is based on
the translation table in a
default or local data file
called
translate.txt. This parameter allows
you to use a translation
table in a different file.
(See
Appendix VII for information about translation
tables.)

-RSF=map.rsf

writes an RSF (rich sequence
format) file containing the input
sequences annotated with
features generated from the results
of Map. This RSF
file is suitable for input
to other
Wisconsin Package programs that support
RSF files. In particular,
you can use SeqLab to
view
this features annotation graphically.
If you don't specify a
file name with this parameter,
then
the program creates one using
map for the file basename
and .rsf for the extension.
For more
information on RSF files, see
"Using Rich Sequence Format (RSF)
Files" in Chapter 2 of
the
User's Guide. Or, see
"Rich Sequence Format (RSF) Files"
in Appendix C of the
SeqLab Guide.

-OPEn=20

restricts the display of translations
to open reading frames (ORFs).
If you supply a
number like
20 with this parameter, the
ORF would only be displayed
if it coded for at
least 20 amino acids.

-CIRcular

tells Map to treat your
sequence as circular. If
a possible recognition site starts
at the end and
continues into the beginning of
the sequence, the site is
marked at the point where
a circular
molecule would be cut.
For instance if your sequence
ends in GAA and starts
with TTC, Map
shows an EcoRI cut two
bases before the end of
the sequence. The sequence
is only circularized
at the ends found in
the file, so if you
want a subrange to be
treated as circular you have
to
create a file in which
the subrange is the entire
sequence (see the Assemble program).

-LINear

is the opposite of -CIRcular.
If you have defined
a command that runs Map
with -CIRcular
as the default, use the
-LINear parameter to make Map
treat your sequence as linear.

-PAGe=60

Printed output from this program
may cross from one page
to another in an annoying
way. Use
this parameter to add form
feeds to the output file
in order to try to
keep clusters of related
information together. You can
set the number of lines
per page by supplying a
number after
-PAGe.

-WIDth=100

allows you to choose the
number of bases shown on
each line of output.
The standard is 60,
which can be shown on
a terminal screen nicely, but
100 sequence symbols per line
is very
convenient for estimating the size
of fragments between cuts.

-THReeletter

sets the translation to show
three-letter amino acid codes instead
of the one-letter codes.
Normally you can set the
translation to show three-letter amino
acid codes by capitalizing your
response to the protein translation
program prompt. However, when
you choose protein
translation from the command line,
you must add -THReeletter to
get three-letter amino acid
codes.

-MISmatch=1

causes the program to recognize
sites that are like the
recognition site but with one
or fewer
mismatches. If too many
mismatches are allowed, the results
may not be meaningful.
The
output from most mapping programs
distinguishes between sites with no
mismatches and sites
with mismatches.

-SILent

shows the places where restriction
sites can be introduced (by
site-directed mutagenesis) without
changing the peptide translation of
the sequence. The -SILent
parameter assumes that the
range you have chosen defines
a coding region and reading
frame precisely. Sites may
be found
that have any number of
bases changed as long as
the changes do not alter
the translation. The
reading frame is implied by
the beginning coordinate you specify.
The output from most
mapping programs distinguishes between real
sites and sites with one
or more mismatches. The
data file translate.txt defines the
genetic code.

-PERFect

sets the program to look
for a perfect alphabetic match
between the site and the
sequence.
Ambiguity codes are normally translated
so that the site RXY
would find sequences like ACT
or
GAC. With this parameter,
the ambiguity codes are not
translated so the site RXY
would only
match the sequence RXY.
This parameter is not the
same as -MISmatch=0!

-ALL

makes an overlap-set map instead
of the usual subset map.
If your sequence is
very ambiguous
(for instance, as a back-translated
sequence would be) and you
want to see where restriction
sites could be, then an
overlap-set map is for you.
Overlap-set and subset pattern
recognition is
discussed in more detail in
the Program Manual entry for
Window.

-APPend

appends the enzyme data file
to your output file.
If you provided your own
translation scheme,
that file is also appended.

-CUTters=gamma.cutters

writes out a new enzyme
data file containing those selected
enzymes that did cut your
sequence
and were not excluded with
any of the -MINCuts, -ONCe,
-MAXCuts, and -EXClude
parameters. If you do
not add a file name
to the -CUTters parameter the
output file will have
the name of your sequence
followed by the file name
extension .cutters

-NONCUTters=gamma.noncutters

writes out a new enzyme
data file containing the selected
enzymes that did NOT cut
your
sequence. If you do
not add a file name
to this parameter the output
file will have the name
of
your sequence followed by the
file name extension .noncutters

-EXCUTters=gamma.excutters

writes out a new enzyme
data file containing those enzymes
that did cut your sequence
but were
excluded with any of the
-EXClude, -MINCuts, -ONCe, and -MAXCuts
parameters. If you do
not add a file name
to this parameter the output
file will have the name
of your sequence
followed by the file name
extension .excutters

The parameters -MINSitelen and -OVErhang
restrict the domain of enzymes
selected.

-MINSitelen=6

selects only patterns with the
specified number or more bases
in the recognition site.
You can
display the sites from any
pattern in the enzyme or
pattern file that you take
the trouble to
name individually, but when you
use all of the patterns,
the program uses all of
the patterns
whose recognition sites have the
specified number or more non-N,
non-X bases.
-MINSitelen=6 replaces the -SIXbase parameter
from earlier versions of the
Wisconsin
Package.

-OVErhang=0

selects only enzymes that leave
blunt ends. Use a
5 with this parameter to
search only with
enzymes that leave 5' overhangs
and a 3 to search
only with enzymes that leave
a 3' overhang.
You can use multiple values,
separated by commas. For
instance, -OVErhang=5,3 searches
with all enzymes that leave
either 5' or 3' overhangs.
You can display the
cuts from any enzyme
in the enzyme data file
that you take the trouble
to name individually, but when
you use *
(meaning all), the program uses
all of the enzymes whose
overhangs conform to your choice
with
this parameter.

The -MINCuts, -MAXCuts, -ONCe, and
-EXClude parameters suppress the display
of selected
enzymes. The list of
excluded enzymes in the program
output includes both selected enzymes
that cut
within excluded ranges and selected
enzymes that did not cut
the right number of times.

-MINCuts=2

excludes enzymes that do not
cut at least two times.

-MAXCuts=2

excludes enzymes that cut more
than two times.

-ONCe

excludes, from the set of
enzymes displayed, those enzymes that
cut your sequence more than
once (equivalent to setting both
mincuts and maxcuts to one).

-EXClude=n1,n2[,n3,n4,...]

excludes enzymes that cut anywhere
within one or more ranges
of the sequence. If
an enzyme is
found within an excluded range,
then the enzyme is not
displayed. The list of
excluded enzymes
includes enzymes that cut within
excluded ranges. The ranges
are defined with sets of
two
numbers. The numbers are
separated by commas. Spaces
between numbers are not allowed.
The numbers must be integers
that fall within the sequence
beginning and ending points you
have chosen. The range
may be circular if circular
mapping is being done.
Exclusion is not done
if there are any non-numeric
characters in the numbers or
numbers out of range or
if there is an
odd number of integers following
the parameter.

-BOTtom

shows where each enzyme cuts
the reverse strand as well
as the forward strand.
The cut point
on the bottom strand is
the 5' end of the
fragment which continues to the
left.

shows enzyme names vertically over
(or under) the position where
they cut. When a
collision at
a cut point requires more
than one enzyme to be
displayed at that point, Map
uses the next
unoccupied column to the right.
A '/' below the
enzyme's name indicates that the
name of the
enzyme has been displaced.
When the number of finds
is very great, the resolution
of this kind
of display is inadequate.
If the display seems too
full, either restrict the number
of enzymes
chosen or use the default
horizontal enzyme display.

The center of the Map
display is a line showing
the cut points with '|'
characters, the top strand of
the
sequence, a scale, and the
bottom sequence strand. These
parameters let you suppress any
of these
lines.

-NOCUTline

suppresses the line of '|'
characters between the enzyme name
and the strand it cuts.

-NOSEQline

suppresses the sequence display.

-NOSCALeline

suppresses the scale line between
the sequence and its complement.

-NOCOMPline

suppresses complement sequence display.

-TABle

If you simply want a
table of which enzymes cut
where use this parameter.
See the topic TABLE
OUTPUT.

-SORtbyenzyme

Table output is normally sorted
by the position of the
cut in the top strand
of the sequence.
Use this parameter to see
the cuts sorted first by
enzyme and then by position.
See the
topic TABLE OUTPUT.

-MONitor

This program normally monitors its
progress on your screen.
However, when you use -Default
to suppress all program interaction,
you also suppress the monitor.
You can turn it
back on with
this parameter. If you
are running the program in
batch, the monitor will appear
in the log file.

-SUMmary

writes a summary of the
program's work to the screen
when you've used -Default to
suppress
all program interaction. A
summary typically displays at the
end of a program run
interactively.
You can suppress the summary
for a program run interactively
with -NOSUMmary.

You can also use this
parameter to cause a summary
of the program's work to
be written in the
log file of a program
run in batch.

Licenses and Trademarks Wisconsin
Package is a trademark of Genetics Computer Group, Inc. GCG and the
GCG logo are registered trademarks of Genetics Computer Group,
Inc.

All other product names mentioned in this documentation may
be trademarks, and if so, are trademarks or registered trademarks of
their respective holders and are used in this documentation for
identification purposes only.