Wiki

Function

Draw a sequence alignment with pretty formatting

Description

prettyplot draws a plot of the input sequence alignment.
The sequences are rendered in pretty formatting on the specified
graphics device. Drawing options control the appearance of the image,
such as boxes, colour and shading for highlighting conserved
regions.

This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation.

Any string up to 4 characters, matching regular expression /^([BPLW]{4})?$/

-pair

array

Values to represent identical similar related

List of floating point numbers

1.5,1.0,0.5

-identity

integer

Only match those which are identical in all sequences.

Integer 0 or more

0

-[no]doboxes

boolean

Display prettyboxes

Boolean value Yes/No

Yes

-boxcol

boolean

Colour the background in the boxes

Boolean value Yes/No

No

-boxuse

string

Colour to be used for background. (GREY)

Any string

GREY

-[no]name

boolean

Display the sequence names

Boolean value Yes/No

Yes

-maxnamelen

integer

Margin size for the sequence name.

Any integer value

10

-[no]number

boolean

Display the residue number

Boolean value Yes/No

Yes

-[no]listoptions

boolean

Display the date and options used

Boolean value Yes/No

Yes

-ratio

float

Plurality ratio for a consensus match

Number from 0.000 to 1.000

0.5

-consensus

boolean

Display the consensus

Boolean value Yes/No

No

-[no]collision

boolean

Allow collisions in calculating consensus

Boolean value Yes/No

Yes

-alternative

list

Values are 0:Normal collision check. (default)
1:Compares identical scores with the max score found. So if any other residue matches the identical score then a collision has occurred.
2:If another residue has a greater than or equal to matching score and these do not match then a collision has occurred.
3:Checks all those not in the current consensus.If any of these give a top score for matching or identical scores then a collision has occured.

0

(Normal collision check. (default))

1

(Compares identical scores with the max score found. So if any other residue matches the identical score then a collision has occurred.)

2

(If another residue has a greater than or equal to matching score and these do not match then a collision has occurred.)

3

(Checks all those not in the current consensus.If any of these give a top score for matching or identical scores then a collision has occured.)

0

-showscore

integer

Print residue scores

Any integer value

-1

-portrait

boolean

Set page to Portrait

Boolean value Yes/No

No

Advanced (Unprompted) qualifiers

(none)

Associated qualifiers

"-sequences" associated seqset qualifiers

-sbegin1-sbegin_sequences

integer

Start of each sequence to be used

Any integer value

0

-send1-send_sequences

integer

End of each sequence to be used

Any integer value

0

-sreverse1-sreverse_sequences

boolean

Reverse (if DNA)

Boolean value Yes/No

N

-sask1-sask_sequences

boolean

Ask for begin/end/reverse

Boolean value Yes/No

N

-snucleotide1-snucleotide_sequences

boolean

Sequence is nucleotide

Boolean value Yes/No

N

-sprotein1-sprotein_sequences

boolean

Sequence is protein

Boolean value Yes/No

N

-slower1-slower_sequences

boolean

Make lower case

Boolean value Yes/No

N

-supper1-supper_sequences

boolean

Make upper case

Boolean value Yes/No

N

-scircular1-scircular_sequences

boolean

Sequence is circular

Boolean value Yes/No

N

-sformat1-sformat_sequences

string

Input sequence format

Any string

-iquery1-iquery_sequences

string

Input query fields or ID list

Any string

-ioffset1-ioffset_sequences

integer

Input start position offset

Any integer value

0

-sdbname1-sdbname_sequences

string

Database name

Any string

-sid1-sid_sequences

string

Entryname

Any string

-ufo1-ufo_sequences

string

UFO features

Any string

-fformat1-fformat_sequences

string

Features format

Any string

-fopenfile1-fopenfile_sequences

string

Features file name

Any string

"-graph" associated graph qualifiers

-gprompt

boolean

Graph prompting

Boolean value Yes/No

N

-gdesc

string

Graph description

Any string

Pretty plot

-gtitle

string

Graph title

Any string

-gsubtitle

string

Graph subtitle

Any string

-gxtitle

string

Graph x axis title

Any string

-gytitle

string

Graph y axis title

Any string

-goutfile

string

Output file for non interactive displays

Any string

-gdirectory

string

Output directory

Any string

General qualifiers

-auto

boolean

Turn off prompts

Boolean value Yes/No

N

-stdout

boolean

Write first file to standard output

Boolean value Yes/No

N

-filter

boolean

Read first file from standard input, write first file to standard output

Boolean value Yes/No

N

-options

boolean

Prompt for standard and additional values

Boolean value Yes/No

N

-debug

boolean

Write debug output to program.dbg

Boolean value Yes/No

N

-verbose

boolean

Report some/full command line options

Boolean value Yes/No

Y

-help

boolean

Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose

Boolean value Yes/No

N

-warning

boolean

Report warnings

Boolean value Yes/No

Y

-error

boolean

Report errors

Boolean value Yes/No

Y

-fatal

boolean

Report fatal errors

Boolean value Yes/No

Y

-die

boolean

Report dying program messages

Boolean value Yes/No

Y

-version

boolean

Report version number and exit

Boolean value Yes/No

N

Input file format

prettyplot reads aligned protein sequences.

The input is a standard EMBOSS sequence query (also known as a 'USA').

Major sequence database sources defined as standard in EMBOSS
installations include srs:embl, srs:uniprot and ensembl

Data can also be read from sequence output in any supported format
written by an EMBOSS or third-party application.

The input format can be specified by using the
command-line qualifier -sformat xxx, where 'xxx' is replaced
by the name of the required format. The available format names are:
gff (gff3), gff2, embl (em), genbank (gb, refseq), ddbj, refseqp, pir
(nbrf), swissprot (swiss, sw), dasgff and debug.

Output file format

The output is to the specified graphics device.

The results can be output in one of several formats by using the
command-line qualifier -graph xxx, where 'xxx' is replaced by
the name of the required device. Support depends on the availability
of third-party software packages.

Output files for usage example

Graphics File: prettyplot.ps

Output files for usage example 2

Graphics File: prettyplot.ps

Data files

Prettyplot uses a comparison matrix file to calculate similarity to
the consensus.

For protein sequences EBLOSUM62 is used for the substitution matrix.
For nucleotide sequence, EDNAFULL is used.

EMBOSS data files are distributed with the application and stored
in the standard EMBOSS data directory, which is defined
by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your
current directory for you to inspect or modify, run:

% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories.
Project specific files can be put in the current directory, or for
tidier directory listings in a subdirectory called
".embossdata". Files for all EMBOSS runs can be put in the user's home
directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

. (your current directory)

.embossdata (under your current directory)

~/ (your home directory)

~/.embossdata

Notes

A consesnsus sequence is calculated for the alignment and
individual sequences compared to the consensus using the specified
comparison matrix file. The default matrix for protein sequences
is EBLOSUM62 and for nucleotide sequences
is EDNAFULL. The drawing options render conserved sites and
regions identified from the comparisons. For example, residues in a
sequence are classed as "identical", "similar" or "other" to the
consensus depending on user-specified thresholds of sequence
similarity (-pair option). Residues in each class are
rendered red, green and black by default (this can be changed).

There are other more general drawing options, for example,
controlling the number of residues displayed per line, background
shading and whether to display sequence names or not.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It exits with status 0 unless an error is reported.

Known bugs

Portrait mode does not cover the whole page! This is a "feature" in
plplot.