SeqPup

a biosequence editor

SeqPup is a biological sequence editor and analysis program. It includes links
to network services and external analysis programs. It is usable on common
computer systems that support the Java 1.1 runtime environment, including
Macintosh, MS-Windows and X-Windows.

Features include

multiple sequence alignment and single sequence editing

read and write several sequence file formats

pretty print of alignments and sequences with boxed and shaded regions

Use command-line style remote analysis programs, using a new CORBA
based protocol

print file formats including PDF *, PICT, Postscript and GIF

automatic preference saving, undo/redo and other full application features

(
new this version,
significant update this version
)

Many of the features have been significantly improved in this release.
As well, this release is much easier to install and use.
This application is a work in progress; it has bugs.

Originally written in C++; this program has been ported to the new Java language.
SeqApp/SeqPup was started in 1990 as sequence editor/analysis platform on which
analysis programs from other authors could be easily incorporated into a useable interface.

You can obtain this release thru anonymous FTP or HTTP to iubio.bio.indiana.edu, in
folder /molbio/seqpup/java/. This version will work on any computer system that supports
Java runtime version 1.1, including Macintosh, MS Windows, and Unix/XWindows
systems. The Internet URLs to this software are

=== Brief Usage and installation, release 0.9 =====
You will need to fetch the SeqPup.jar java archive. It includes
help documentation and data files, which will be installed when you
first run the program.
To start SeqPup from MSWindows or Unix, use one of these command lines
jre -cp SeqPup.jar run (for Java 1.1.x with jre)
java -cp SeqPup.jar run (for Java 1.2.x - not a recommended version)
java -classpath SeqPup.jar:$CLASSPATH run (for Java 1.1.x without jre)
To start SeqPup from MacOS, use the MRJ application, in SeqPup9-macos.hqx.
Apple Java (MRJ) version 2.1.2 or later is recommended.
There are several external analysis applications available for SeqPup,
as compiled programs for MacOS and MSWindows. Find these in
seqpup9-methods-msdos.zip (MS Windows ZIP archive)
seqpup9-methods-macos.sit (MacOS Stuffit archive)
These should be installed in a methods/ folder in the same folder as
SeqPup.jar.
Java version: This program will not work properly with the
Java 1.2 runtime that is now commonly available on MSWindows systems.
SeqPup will work with the Java 1.1.8 for MSWindows, which
can be installed in addition to Java 1.2.

Perhaps the most annoying problem to me with this software is the complexity of fetching
and installing it. If you use Netscape or other Java enabled Internet browsers, you can use
a Java applet just by clicking your way to a WWW page that has one embedded. This
software is more complex than is suitable for an applet; for one thing it reads and writes
files on your computer like any average program, but which applets cannot yet do.

Fetching

The current state of Java applications on computer systems means that you will need to
fetch a Java runtime system as well as the files for this specific application. If you have
other Java applications on your computer, you may already have the Java runtime system
needed. Some current operating systems, including MacOS 8.1 and Sun Solaris 2.6
include the Java 1.1 runtime already.

You will find a package file that includes all files needed for this application, in the full
package/ subfolder at the Internet site above. Macintosh users will find these files in the
archive file macos.seqpup-??.sit.bin. This is encoded as a Stuffit-MacBinary file.
Many Internet fetching programs such as Fetch and Netscape will decode this directly, or
utility software like Stuffit can do the job. MSWindows users will find program files in
the archive file mswin.seqpup-??.zip. You will want to use a Win95/NT unzip utility
that preserves long file names to extract this file. Unix and others will find just the
program files in the archive file unix.seqpup-??.tar.gz.

The essential files and folders that make up this application are

SeqPup.jar-- the program java code file, including some default
preferences and resources.

data/ folder-- preferences and pictures needed for the program and
documentation. Now external applications and methods are
located here.

As I've learned with other multi-platform applications that I make available to the
bioscience community, there are difficulties involved in updating multiple archives for
different computer systesm. The simplest way for me to offer updates to this software is to
provide it as separate files. The most current version of this software is also available in its
un-archived form. See also the Updates section below, for semi-automatic updating.

This current release is based on Java version 1.1. To run it, you should have installed a
Java Runtime Environment (JRE), version 1.1 or later. This can be found through
Javasoft, and at various mirror sites around the world.

The program is installed as follows. On all systems, the following items should be in one
folder: the SeqPup.jar Java class archive file, the data/ and classes/ folders, and the
seqpup-doc.html document. Each system also needs in this folder an application or
script that starts the application in the Java runtime environment.

There are various Java runtime systems available for Macintosh. For general use, the
Macintosh Runtime Java (MRJ) produced by Apple Computer is the only one on which this
program has been fully tested.

1. Keep in the same folder the application SeqPup, the SeqPup.jar Java class archive,
and the data and classes folders. Move the application SeqPup from the
local/Macintosh/ folder into this main folder.

2. Use the MRJ installer from Apple Computer to install this Java runtime software. It
needs to be version 2.0 or later. If you have MacOS 8.1, this is included as part of the
OS. With MacOS 8, an earlier version of MRJ is included, but it isn't compatible with
this software. You should upgrade to the MRJ 2.0 release.

3. Unpack the local/Macintosh/data-methods-macos.sit archive. It includes child
app methods that need to be placed in the data/nethods/ folder.

This program calls an Internet browser to display HTML documents. The default is to call
Netscape. Also you now need to have this browser application already open; SeqPup
won't yet open it for you (a bug). The browser can be changed by editing SeqPup
preferences and changing the user.openurl variable. It needs to be set to the MacOS
"creator" name for the program you want (sorry I don't yet have an easy interface to set
this). For Netscape, the creator is "MOSS", for MS Internet Explorer, the creator is
"MSIE". This is what the setting looks like now user.openurl=MOSS

This has only been tested with MSWindows95/NT, and may not work with MSWin3.
When you unzip the archive file, use a current unzipper that preserves long file names.

1. Keep in the same folder the program batch file SEQPUP.BAT, the SeqPup.jar
Java class archive, and the data and classes folders. Move the SEQPUP.BAT file
from the local/MSWindows/ folder to this main seqpup folder.

2. Install a Java Runtime system for MS Windows. A recommended Java runtime is
found at http://www.javasoft.com/products/jdk/1.1/jre/index.html. You may want to
install this in a general MSWindows folder, perhaps C:\WINDOWS\JAVA. I don't
know of a prefered location yet on MS Window systems for Java runtime files. You
will need to edit the batch file (step 3) to account for this location.

3. Edit the SEQPUP.BAT file to make the path names match the file locations on your
computer.

set JAVA=C:\WINDOWS\JAVA
set APPPATH=C:\seqpup08

NOTE: If you get this message when running the batch file OUT OF ENVIRONMENT SPACE
then try setting Properties:Memory:Initial environ value to 4096 for this batch file

4. Unpack the local/MSWindow/data-methods-mdos.zip archive. It includes child
app methods that need to be placed in the data/methods/ folder.

For the application to link properly to Netscape or other Internet browser, you may need to
edit the preferences file. You can do this from within the application; see the Options/Edit
Framework ... Menu. Or you can edit the file dclap.ini which will be created in the \java\
folder. In either case, you want to enter the variable name user.openurl= then the full path
to your browser to be sure that it works properly. This path may well be the same on your
system as mine, which is as follows. Note the quote marks (due to space in name) and the
double backslashes \\ which are required to insert one \.
user.openurl="C:\\Program
Files\\Netscape\\Navigator\\Program\\netscape.exe"

If you use the Edit prefs menu, after editing close the window. You should be prompted to
save changes; do so.

1. Keep in the same folder, the program start script seqpup, the SeqPup.jar Java class
archive, and the data and classes folders. Move the seqpup script file from the
local/Unix/ folder to this main seqpup folder.

3. Edit the seqpup file to make the path names match the file locations on your computer. set java=/usr/local/java

4. You will want to compile or install binaries of the child applications for your system to
use this feature (see Child Tasks below). Source code is provided for example child
apps in the data/methods/ folder. You can use other pre-compiled versions of these on
your system. You will need to edit the .command files in data/methods/ if these apps
are located in other folders.

You need to define the user.openurl= variable to find your Netscape or equivalent. You
can do this from within the application; see the Options/Edit basic prefs... Menu. Or edit
the file ~/.dclaprc directly to enter such a line. The variable line for my unix system is

user.openurl=/usr/local/bin/netscape

Also, you might instead use a shell script (like the "netscape.sh" included). If you rename
that to netscape, edit it to suit, and put in the folder with SeqPup.jar file, it may take the
place of editing the preference file.

Program help is available from this document. Typically the program documentation that I
write gets done last and doesn't receive the effort that it deserves (because typically I
haven't enough time to finish the software either, which is written in my spare, unpaid
time). The help that I can offer to individual questions may be very limited. But please do
send your questions and comments by e-mail to the address "seqpup@bio.indiana.edu",
and these will be taken into account for future updates.

See also below section Bugs for a list of known program bugs and some work-around
hints.

This software has a preliminary network method for easier updating when the program is
revised. On the main splash-screen, the Updatesbutton (on the "version" label) will
check whether the software version you have is out-of-date. See also the File menu, Check
updates command. These connect by Internet to the home archive for the software.

These options and the help command use an Internet browser program for opening URLs
(Internet universal resource locators), which needs to be configured as discussed in the
Install section. The updates option will list in your browser those items of this software
that are newer than the version you have installed.

The first window displayed when you start SeqPup is a splash screen that tells you a bit
about the application and has active buttons to perform some basic commands. These
include opening sequence files, fetching sequence from Internet servers, opening the help
information, and network links to application updates, e-mail comments, and application
source code.

All these functions are also accessible from the standard application menus. This form of
Hypercard-like picture window with active buttons is used in all the DCLAP applications.
Active button areas are highlighted when your mouse moves over them, and its function is
explained at the bottom of the window. Mouse clicking, once or more depending on
clicksToActivate preference, will activate that function. These Hypercard-like windows are
configured as per standard HTTP NCSA-Imagemap information, stored in the data/ folder
pix/about.gif.map file. Functions can be changed and new images substituted if you
desire.

A multiple-sequence view which is the primary display when you open a sequence
document; the single sequence editting view; various print views which result from an
analysis, like the Restriction map; and dialog views where you control some function.

Many of these views have dialog controls -- push buttons, check boxes, radio controls and
edittable text items -- to let you fine-tune a view to fit your preference. Many of these
views also will remember your last preferences.

When a view has editable text items, including the sequence entry views, most usual
undo/cut/copy/paste features will work.

Two or more views of the same data are possible. Some of these are truly views of the
same data -- changes made in one view are reflected in another. For instance, one can
have a single sequence view open, select a feature and mark that feature position on the
main document view, and also have that feature mark show in any open pretty print of that
sequence.

Other views are static pictures taken of the data at the time the analysis was performed --
later changes to the data do not affect that picture.

The main view into a sequence document is the multiple sequence editor window, which
lists sequence names to the left and sequence bases as one line that can be scrolled thru.
Bases can be colored or black. Sequence can be edited here, especially to align them, and
subranges and subgroupings can be selected for further operations or analysis. Entire
sequence(s) can be cut/copied/pasted by selecting the left name(s). Mouse-down selects
one. Shift-mouse down selects many in group, Command-mouse down selects many
unconnected. Double click a name to open a single sequence view. Select name, then grab
and move up or down to relocate.

Select the lock/unlock button at the view top to lock/unlock text editting in the sequence
line. With lock on (no editting) you can use shift and command mouse to select a subrange
of sequences to operate on.

Bases can be slid to left and right, like beads on an abacus, when the edit lock is On (now
default). Select a base or group of bases (over one or several sequences), using mouse,
shift+mouse, option+mouse, command+mouse. Then grab selected bases with mouse
(mouse up, then mouse down on selection), and slide to left or right. Indels "-" or spacing
on ends "." will be added and squeezed out as needed to slide the bases. See also the
"Degap" menu selection to remove all gaps thus entered from a sequence.

For entering/editting a single sequence, this view displays one sequence with more info and
control. Edit the name here (later other documentation). Bring out this view by
double-clicking sequence name in align view, or choosing Edit from Sequence menu.

Various analyses provide non-editable displays. These are usually saveable as PICT,
POSTSCRIPT and GIF formats for editing in your favorite graphic editor program, or
printing. When a print or graphic view is displayed, choosing the File/Save As command
will offer you the choice of where to save and in what format.

SeqPup uses plain text files for its basic sequence data. These files can be exchanged
without modification with many other sequence analysis programs. SeqPup automatically
determines the sequence format of a data file when opening it. You have an choice of
several formats to save it as.

The program looks in the folder "data/prefs" for text files containing various data. At
present these files include "codon.prefs", "renzyme.table" and "color.prefs".

Various temporary files are created for child tasks, currently in the main folder where the
program lives. Currently you cannot run the Child Tasks portion of SeqPup from a locked
file server because these temporary files need to be created. Otherwise, SeqPup should
operate from a locked fileserver properly, and can be launched by several users at once.

In the data/prefs/ that is comes with the application, you find these files
color.prefs -- for base colors in displays
seqmasks.prefs -- for pretty printing displays
renzyme.table -- for restriction maps
codon.prefs -- for protein translation
any of these can subsitute for the codon.prefs file
codon-drosophila.prefs
codon-human.prefs
codon-ecoli.prefs
codon-rat.prefs
codon-tobacco.prefs

The file called "codon.prefs" in folder "Tables" is used for translation of nucleic to protein
sequence, and for backtranslation. This file may be replaced with a table of your choice in
the following format. The format is nearly identical to that used by GCG software codon
tables. The Codon column has been put first. Each codon is followed by "=" equal sign.
Any documentation is preceeded by "#" pound symbol.

New will create a document of sequence data (alignment view). With a new document one
can add new sequences, or copy selections from another document.

Open commands will open exising files. -
The Open as Sequence... choice will open a file of sequences into a new align view
document. -
You can also open appending sequences to the current document (Append to sequence
list). -
You can fetch sequences from an Internet server (see below SRS information) with the
Open sequence from databanks... command. -
The Open Text command will open and display a file as plain text. -
The Open URL command will open an Internet connection (or local file) given a URL of
the format http://internet.address:port/path/to/data.file, as in
http://iubio.bio.indiana.edu/Readme. If the file is sequence data it will be displayed in an
alignment window. Currently only the HTTP protocol is supported for this command.

Save and Save as will save the current document to disk files. Save is context sensitive
and will be active when a document has been changed.

Revert will restore the open align view to the last version saved to disk.Save selection wil saves only highlighted sequences to a new disk file. Doesn't affect
save status of current full alignment document.

Print setup, Print will print the current view (see bugs).

Check Updates will connect to the home server for the application and offer information
on new versions and updates to the application.

Undo, redo -- Standard application commands to return a document to its state before a
command was performed (undo), and to again do the command (redo) after an undo. For
instance, complementing (changing) a sequence should be undoable. These are context
sensitive, and should be enabled only when possible. Current design is to offer several
levels of undo and redo, but see bugs.

Cut, copy, paste, clear, selectall -- Standard application commands that are availble in
a context-sensitive manner. Cut moves a selection from the document to the clipboard.
Copy makes a copy to the clipboard. Paste copies from the clipboard to the active
document. Clear removes a selection without copying to the clipboard. The clipboard is an
application-wide special document that stores these data until overwritten by new data.
Clipboard data is potentially copyable to other applications (see Bugs).

For instance, selected editable text should have these functions to manipulate the text.
Sequence selections enable these functions to move sequence data within and between
alignment documents. Not all appropriate contexts may yet have these commands enabled
(see Bugs).

Find, Find same, Find "selection" will search for strings in text.

Find ORF, this will select the first or next open reading frame of the selected sequence.

New sequence -- append a new, blank sequence to the sequence document.

Edit -- open single sequence editting view for selected items.

Reverse, Complement, Rev-complement -- Reverse, complement or
reverse+complement a sequence. Works on one or more sequences, and the selected
subrange.

Rna-Dna,Dna-Rna -- Convert dna to rna (t->u) and vice versa. Works on one or more
sequences, and the selected subrange.

Degap -- remove alignment gaps "~". Works on one or more sequences, and the selected
subrange. Gaps of "-" are locked and not affected by Degap. Works on one or more
sequences, and the selected subrange.
Lock Indel & Unlock Indel -- Convert from unlocked gaps "~", to locked gaps "-".
Unlocked gaps will disappear and appear as needed as you slide bases left and right.
Locked gaps are not affected by sliding nor by Degap. Works on one or more sequences,
and the selected subrange.

Distance -- generate a distance or similarity matrix of the selected sequences. The
Options/Seq Prefs... dialog modulates this function.

Pretty print -- a prettier view of a single or aligned sequences. Use these views to print
your sequences. Printing from the editing display will not be supported fully, and may not
print all of your sequence(s).

Nucleic, amino codes -- These provide both reminders of the base codes, and a way to
select colors to assocate with each code. See below for some discussion of the two
"aa-color" documents that now ship with SeqPup.

Single sequences - editing and features
The Edit sequence function opens a single sequence editing window for selected
sequence(s). One can edit sequence bases here, change sequence name and perform some
sequence manipulations and analyses.

A recent addition is Document and Features sections, along with the Sequence editing
window. These sections are currently editable text. The format is not yet formalized but
follows the specific sequence file format. Currently only Genbank and EMBL formats are
parsed for documentation and features.

These positions will be read by the program when you highlight the text, then choose the
commands in the Features/ menu. Mark on main view command will copy the selected
position to the alignment window, erasing any other mark for that sequence. Add to
main view command will copy the position, adding it to any other marks. This is most
useful when the main view has a Mask level selected. One can add feature marks to
different mask levels. Then one can pretty print the sequence and these marked features will
be highlighted according to the current styles for those masks.

One can set various options which persist to later uses of the program. The Options menu
includes several dialogs for these preferences, including for Sequence data functions, for
Sequence Pretty print styles, for Base colors and styles, Codon table, Sequence Retrieval
System (SRS) server.

Also among the options are dialogs to edit directly application and framework preference
files. Generally you can ignore these, as other dialogs handle this. But some options don't
yet have an easier interface for changing. One important one is the framework preference
AWTs.clicksToActivate=1 which sets number of mouse clicks to active an icon button
or other relevant item. Many people prefer clicksToActivate=2 (double-click).

For MSWindow and XWindow systems, the framework preference user.openurl is
important. For Macintoshes, the equivalent is done using InternetConfig software.
See the above Installing section for details of setting the user.openurl preference.

An application preference with no current dialog choice is Adorns.backColor=
0xe8f0ff, which sets window background color (0xE8F0FF is a light blue).

Option files are stored in a system specific location as text files. One can edit them, when
the application is not running, with a text editor. On Macintosh systems, the files are
stored in System Folder:Extenstions:MRJ Libraries: as dclap.prefs and seqpup.prefs files
(when using the MRJ runtime). On MS Windows sytems, they are stored in
C:\WINDOWS folder as dclap.ini and seqpup.ini. On Unix systems, these options are
stored in ~/.xxxrc files, including .dclaprc for the framework prefs, and .seqpuprc for the
application prefs.

This is the syntax for specifying style information used in the pretty print function.
This information is currently stored in the data/prefs/seqmasks.prefs file, and can be edited
with the Options/Base style table... command, or with a standard text editor. Each Style
label should now be prefixed with the mask level it applies to, as in

repeatchar=. - use this if you want mult-align repeated chars set to a single character

fontname= - set a valid computer font name, like Courier, Helvetica, Times, ...fontsize= - set point size of the fontfontcolor= - set rgb color of the font, using 6 digit hexadecimal value, see sample values in table
(e.g., 0xff0000 is red, 0x00ff00 is green, and 0x0000ff is blue, 0x000000 is black and
0xffffff is white, 0xaaaaaa is one shade of grey).

backcolor= - set rgb color of the background behind font

boxstyle=solid set the style of the boxing line
current values are dashed, dotted, solid, dark, medium or light

fillpattern= - set the pattern used to draw the background color or fill. This
will allow "hatching" types of shades. Not well tested yet (mostly needs
printer output to see).
- set this with two 8-digit hexadecimal values (to create an 8x8
pattern array). You need to experiment with values to find a nice
pattern. An example is fillpattern=0xaa55aa55 0xaa55aa55

Currently you can set four mask styles in this table. These should start with a header like
below, but name as you like. Lines starting with "#" or "!" are comments that are ignored.
Style names starting with "mask1." are associated with the sequence alignment mask called
"Select mask 1...". Start the names with "mask2." to associated with "Select mask 2...",
start with "mask3." to use wiht "Select mask 3..." and start with "mask4." to use with
"Select mask 4...".

Base colors can be set for the alignment display and pretty prints. The preference file
Color.prefs specifies color codes for each nucleic and amino base. It may be edited from
the Options/Base color table... function, or directly with a text editor.

Currently color values are stored as hexadecimal codes. This is stored as a 3-byte hex
value of Red-Green-Blue (RGB) values. 0xFF0000 is red, 0x00FF00 is green,
0x0000FF is blue. Future versions of the program should include a color picker interface.

A few early users of this new version provided color amino selections that ship with
SeqPup. Here is one description.

The Internet features of SeqPup let you interchange ideas and data with people and
biocomputing services around the world. SeqPup includes a selection of network access
features in the developing area of networked biocomputing.

New in version 0.7 are (a) a client for the Sequence Retrieval System (SRS) to look up and
fetch sequences from databanks like GenBank and EMBL, Swiss Protein an PIR, and (b) a
client for the NCBI-BLAST server.

Use the File/Open sequences from databank menu command (or the Fetch
sequences button on About SeqPup) to access SRS servers. Type one or more key words
to describe the sequences you want to view. Sequence titles are fetched for all matches
(which may be hundreds or thousands) from the selected server, and displayed in an
alignment document view. You then can fetch full data for specific sequences by active
clicking the name, or choosing the Sequence/Edit command.

You can use boolean operators & (AND), | (OR), ! (NOT) to join several key words in a
query to tailor your search. SRS servers offer searches by fields of data. The general field
"all" searches all indexed fields; each databank offers a selection of fields such as
organism, accession, title, comments, and so forth.

The current SRS client in SeqPup is fairly simple, and doesn't offer the rich range of
options you will find via an HTML browser, but it does offer the direct step of loading
sequence data from an Internet server into this sequence editor.

The Options/SRS setup dialog lets you set your prefered server, data libraries and data
fields for a query.

The NCBI BLAST server performs a sequence similarity search of GenBank and/or other
sequence databanks, matching your sequence against published sequences. To learn more
of BLAST at NCBI, see http://www.ncbi.nlm.nih.gov/

The current BLAST client in SeqPup is also fairly simple. It doesn't offer any more than
an HTML browser, except for the direct step of loading sequence data from your sequence
editor to the analysis server.

To perform a BLAST search, select a sequence entry in a document, choose the
Sequence/BLAST@NCBI command which will open a sequence edit view with
BLAST option choices. You can edit the sequence here (without affecting the sequence in
your main document). You can select the results document file (in HTML format which
will be opened by your prefered HTML viewer). There is an Options drop-down dialog,
click the BLAST options triangle/arrow to open this section. Choose among which
BLAST program, which data library to search. Both of these are sequence context
sensitive -- DNA and Amino sequences have different selections. The Do BLAST button
sends your sequence to the server at NCBI via HTTP, and saves results to the selected file
which will be displayed in your HTML viewer.

The Externals menu lets you link SeqPup with external sequence analysis programs that
you or others may write. SeqPup can be configured to run command-line style
applications, sending them sequence data and command information. When the child
program is finished with its analysis, SeqPup will display its results.

The current Externals menu has

Open BOP server (see below) for analyses run on remote computers

Open command file for testing new commands

Local methods submenu that includes commands found in the data/methods/
folder.

When BOP servers are attached, their commands are added also to this menu.

The general design of child applications is taken to be data analysis programs that have a
command-line user-interface, and that take input data from a file or from the system
"standard input" file (stdin), and that write outputs to files and to two system standard files
"standard output" (stdout) and "standard error" (stderr). This is how many existing
analyses programs work, and it is straightforward to program this basic kind of interface.

The value of SeqPup joined with these kinds of programs is that the SeqPup can
concentrate on providing an easy-to-use interface for biologists, and the analysis
application can concentrate on data analyses, without having to add a lot of software to
provide a humanly usable interface.

Many command-line biocomputing programs, including versions of Clustal, CAP, tacg,
primer, FastA, and so forth can be added as Child apps or BOP remote services.

Which child applications?

I hope this new ChildApp/BOP method is general enough to let you add almost any
command-line program. I'm still working on special cases like Phylip package that
requires a structured command-file instead of command-line options. If you add any
biocomputing programs that can be freely distrubuted with SeqPup, consider sending
them, or the command configuration file, back to IUBio archive for addition to the general
distribution.

On command-line systems, including Unix and MSDos/MSWin, you should be able to use
any pre-compiled version of a program that runs in this command-line style. On Macintosh
systems (command-line-less), you will need to compile a command-line program with the
ChildAppJ.c main program source (see the data/methods folder). This allows SeqPup to
send command line parameters using a file.

You can add new child apps to SeqPup by adding text files to the data/methods/ folder
with the suffix .command,that include the string "Content-type: biocompute/command" at
the top, and follow the syntax described below and given in example files. See especially
the clustalw.command file.

The biocompute/command file syntax

Newlines or ';' separate key=value pairs in a structure. Values that include white space
need to be quoted with "" or ''. -
use backslash to escape special characters in a string, mainly tabs, newlines and such. A
string can be continued on multiple lines using \ just before the line end. Enclose such a
string in quotes.-
A structured value (with subfields) needs to be enclosed in curly brackets {}. -
The order of fields in a structure does not matter. Some fields are required, some are
optional. -
Strings in a string list in the value.list, menupath, resultsKinds and others can be
separated with tab or | (pipe) or comma characters. -
Comment lines starting with # are ignored.

The top level key is command = { various other key=value pairs }
Within a command, most of the fields are parameter lists (parlist = { list of pars } )
and parameters (par = { structure} ). All parameter values should include an id field, a
value field, and can include a label field for display.

ID values are case-sensitive, unique strings. You reference IDs and other variables with
dollar sign, as $ID. TITLE, INFO and HELP are special parameter ids.

The command key includes these subfields:id = a unique string (required)action = the command line to be executed, with runswitches of parameters to be
substituted (required)transport = local: for use on the same computer, bop: for bopper. This may be optional,
and should be set by softwareparlist = { list of parameters } (required)resultKinds = string list of MimeTypes: text/plain, biosequence/fasta, ... (optional)filepath = path to app on server (optional)menupath = menu item name, with submenu path, e.g. "Utilities|Reformat" (optional)

The parameter key includes these subfields:id = a unique string (required)label = visible label (optional)value.type = value (see below, required)runSwitch = the command line string to be inserted in the action string. It is optional.
This often includes the term $value, which is the special variable signifying the parameter
value chosen by user. In the case of value.boolean types, this runswitch is set to null if the
value is false.ifelseRules = string list of rules to enable the parameter, based on things like protein or
nucleic type of the input data; yet to be implemented (optional)

Labels and values of a parameter are shown to the user in a dialog form. The values can be
changed by the user, depending on type of value. Other parts of this description are mostly
for the server's use in determining how to run a command-line program, and how to get
and return data.

There are many variants of the value field. These are specified as value.boolean =,
value.integer =, value.string = , and so forth. These match the ValueUnion structure of the
bopper2.idl.

value.container = is a value that includes other parameters, and is displayed as a
container of options to the user. It may be a required or optional container. It has the
subfields:required = true or falseparlist = { list of pars } (required)

value.choice = is a value that includes other parameters, often boolean options. It has
the subfields:multipleChoices = true or falseminToShow = minimum number of choices to be displayed parlist = { list of pars } (required)

For MacOS, there is limited support for AppleScript commands when using the MRJ java
runtime. Use the word applescript as the action command:
action = "applescript text of script to run here"
Currently no objects are returned, but script results are printed to System.out

Side note: Prior versions of SeqPup used an HTML FORMS syntax. This has been
replaced by a new syntax, with some misgivings, because programming effort to support
HTML was much more costly, and this new syntax can be extended more easily to include
features needed for biocomputing. The syntax evolved from this prior work and the GCG
SeqLab configurations. It will be extended in the future to more fully cover needs for
biocomputing programs. That may include adding back some of the HTML formatting
options.

An Internet method of using "child apps" is now available with SeqPup. This allows one
to run analyses programs on a remote computer, and interface with SeqPup's editor
platform (fairly) transparently, as for the local child apps. This is made possible with a
network protocol I've acronymed BOP (Biocomputing Office Protocol; obviously the
acronym came first). The first version of BOP written in 1996 was based directly on the
POP internet mail protocol. BOP2 (Bopper2) uses a CORBA-based interface, and replaces
the unfinished BOP1 methods.

Many command-line programs, including versions of Clustal, CAP, tacg, primer, FastA,
BLAST, the Phylip series, fastDNAml, and so forth can be added as BOP services fairly
simply.

One potentially popular use for this BOP interface may be to offer a simple-to-use client for
Genetics Computing Group (GCG) command-line software. As of this writing, an example
Bopper server for GCG software isn't quite ready, but will soon be.

If you are an administrator of GCG software for your institution and would like to test this
experimental version of Bopper2 with GCG at your site, please let me know.

The configuration of apps on a server computer is essentially the same now as
configuration of local child apps running from the SeqPup data/methods/ folder.

To provide BOP services to SeqPup or other clients, follow these steps:

-- Install Bopper2 on a server computer. The current Bopper2 is based on a CORBA
Interface Definition (IDL). It is implemented in Java, using the free Omnibroker ORB. It
will potentially run on any system with a Java runtime, but has only been tested on Unix.
The bopper2 distribution should include all Java source and classes needed to run it,
excluding a Java runtime and the command-line programs themselves.

-- Configure bopper2 to add command-line programs. The same .command file syntax is
used for local and remote external commands with SeqPup. But one may need to modify
file path and perhaps other information for each specific system. See the data/methods/
folder in SeqPup for example .command files.--

run the bopper server and publish its access url. I hope to add some directory of bop
servers mechanism, but that currently isn't available. The URL for the test bop server at
IUBio archive is iiop://iubio.bio.indiana.edu:7000/bop

Note the IIOP protocol specifier, which is a CORBA standard network protocol. "bop" is
the name of this specific service. Other named services may be run at the same host:port.

I have high hopes now for Java as a development language and toolkit, good for rapid
development of complex applications, and for many other development uses. Writing
Phylodendron and LoopDloop applications in early 1997 gave me a feeling that Java is
ready for complex application development. Extending this to FlyNapp and SeqPup
applications, which are quite complex, has shown me that Java does have the potential for
rapid application development with a good app framework. See also the Genesis of
Phylodendron.

These four biocomputing applications now share about 60 - 70% of their code.
Improvements in one lead to improvements in the other in many cases, and that holds for
future applications written with this framework.

This framework called DCLAP was started with the NCBI toolkit, a cross platform C
toolkit on which Entrez, Sequin and other apps from NCBI are written (Thank you to
Jonathan Kans and colleages at NCBI for this wonderful, free toolkit). On top of this
toolkit I wrote a C++ framework which is meant to handle much of the basic application
chores such as document opening, saving, doc and window management, menu and
command management, etc. With the advent of Java as a C++-like language that has
broad support and funding of tools from the commercial sector, but which also is available
in free form at its basics, it looked like a good underpining for rapid, cross-platform app
development. However, neither NCBI toolkit nor Java nor other sources provide the kind
of application framework freely that makes it quick and easy to produce robust, easy to
use, full featured applications for the biosciences. As I write new applications, I aim at
improving such a framework so that the next application can be written more quickly than
the last. The source code for this framework, in C++ and now its beginnings in Java, is
available freely to others for scientific application development. The current Java version
of DCLAP is preliminary, and will change significantly over coming months as it is more
fully converted to use Java version 1.1. However I find it now very helpful in producing
new applications, and hope that other programmers may also find it useful.

SeqPup is built on an object-oriented application framework, originally written in C++,
called DCLAP. This framework is designed to speed the development of easy to use,
complex programs with a rich user-interface. At this point, DCLAP is an unfinished
framework. It is lacking in documentation. However, it is complete enough to build
complex programs like SeqPup.

New applications can be built to employ and reuse these classes fairly quickly. Variations
on the current methods are simple to add in the class derivation method of C++. For
instance, new document formats can be added on the Drtf display objects, and new
sequence manipulations can be added in the biosequence handlers, by building on current
methods.

DCLAP rests upon the NCBI toolkit, including the Vibrant GUI toolkit, which is designed
for cross-platform functioning. The successful genome data browser Entrez is written with
the NCBI toolkit.

All of this source is available without charge for non-profit use (see copyright). The NCBI
toolkit portion is further available for profit use, and such arrangements may be made for
use of DCLAP.

DCLAP will never compete with commercial programming frameworks, but it has the
virtue of being freely available and redistributable, and includes support specifically for
biocomputing applications. If you are undertaking a biocomputing project requiring a rich
user interface, and wish it to run on multiple computer platforms, this may be a worthwhile
choice, especially if you wish to redistribute your source code for the benefit of the
scientific community.

The DCLAP developer archive is at ftp://iubio.bio.indiana.edu/util/dclap/
Please contact Don Gilbert for further information on using this framework in other
applications.

Problems and shortcomings of this software are the responsibility of Don Gilbert, to who
any correspondence regarding problems should be addressed. Comments, bug reports and
suggestions for new features (see below) are very welcome and should be sent via e-mail to
SeqPup@Bio.Indiana.Edu

With any bug reports, I would appreciate as much detail as is reasonable without putting
you off from making the report. If you don't have time to send detailed descriptions of
problems, please do send comments and reports, even if all you say is "Good" or "Bad" or
"Ugly".

Please include mention of computer hardware, and operating system software, including
version. Describe how the problem may be repeated, if it is repeatable. If it is sporadic or
only seen once, please also describe actions leading up to it. Include copies of data if
relevant.

You may use this program for your personal use, and/or to provide a non-profit service to
others. You may not use this program in a commercial product, nor to provide commercial
service, nor may you sell this code without express written permission of the author.
You may redistribute this program freely. If you wish to redistribute it as part of a
commercial collection or venture, you need to contact the author for permission.

The source code to this program is likewise copyrighted, and may be used, modified and
redistributed in a free manner. Commercial uses of it need prior permission of the author.

Any external applications that may distributed with SeqPup are copyrighted by their
respective authors and subject to distribution provisions as described by those authors. At
present this includes ClustalW, by Des Higgins and colleagues, CAP by Xiaoqiu Huang,
and FastDNAml, written by Joseph Felsenstein with modifications by Gary Olsen, Hideo
Matsuda and Ross Overbeek, is copyrighted by University of Washington and
Joseph Felsenstein.

Distribution of external analysis applications with this program is done as a convenience for
users, and in no way modifies the original copyright. If there is a problem with this,
instructions to users for obtaining and installing external applications will be substituted.

No warranty, express or implied, is provided with this software. The author is trying to
produce a good quality program, and will incorporate corrections to problems reported by
users of it.

General
- view size-sensitive window scroll bars are used in several windows. These may not
yet work fully and seemlessly. Views may be shifted above the scroll area, or scroll
bars may not show up as they should when views extend beyond the window.
Resizing the window will often cure these problems.
- drop down boxes are used extensively in the dialog windows, to hide/show
selected information. Currently when a box is dropped down by clicking its drop
arrow, the box isn't resized/displayed fully. One needs to resize the window with a
mouse drag to get it displayed fully.
- appMenuBar needs work -- menus not showing in new doc (mrj), other bugs
- context sensitive menus not always properly sensitive to context (disabled when should
be able, or vice versa).
- undo/redo isn't working in cases where it should. but does work in several cases.
repeated undo/redo generally doesn't work, while first level undo often does.
- copy/cut/paste functions may not be working as smoothly in as many contexts yet as
they should be.
- clipboard use (via copy/paste) and display needs work; export of clipboard to other
apps not yet supported ( will happen when converted to jdk1.1)
- window menu doesn't always list items (java runtime/os variable)
- preference editing needs improved user interface

Sequence functions
- not yet ready : Restrict map, Dot plot, Nucleic & Amino codes pictures.
- Consensus overwrites first sequence in selection w/ cons as well as appending cons
seq at end of list
- mask menu items not always enabled when mask views are selected (context sens.
menu bug).
- find bases not ready; find ORF may be okay but needs testing
- sequence file reading and writing (readseq functions) still need testing and may well
have bugs. Interleaved formats NEXUS/Paup, Phylip are not debugged. New
formats await adding.
- single sequence editor is slow for long sequences
- sequence manipulation functions for single sequence window may not be ready.

- feature able parsing is preliminary; expect it to be improved in future releases.
-- saving feature/document info associated with sequence works now only for genbank
and embl formats; cross-saving is still problematic (embl->genbank and vice versa)
-- editing feature/doc info in the single sequence windows should work but needs
testing and may have bugs
-- using feature ranges to mark up masks and pretty prints, while now possible, is still
too awkward a process; I hope to make this essentially automatic in later releases.
- changes to prefs such as codon prefs, color and style prefs may not be stored (edit these
files w/ external text editor if need be).

Macintosh specific:
- MRJ 2.0 seems to have problems that other JRE's don't (besides slowness).
-- The work around with several window display problems is to resize the window a
bit (grab size box and drag some) to get it to display properly.
-- Menus disappear when a new window is opened. The work-around is to select
another SeqPup window then switch back to the new window, and menus appear.
-- scrolling dialog windows don't update on scroll -- esp. ones w/ dialog items.
-- text document display is horribly slow for any text longer than 20-30 lines

MS Windows specific:
- window menu may be non-functional, or picks wrong window
- functions that depend on window list (close at least) not working properly

XWindows specific:
- the BLAST dialog seems to send only a few bases to NCBI server

Coming Features

Somewhat further on, I'd like to make SeqPup a bean-box, capable of incorporating new
functions using the JavaBeans technology. It is a hope, that I don't know if it is feasible in
my programming time frame, that this bean interface will be simple enough that an average
biologist with interests could put together a data analysis function in Java and add it to
SeqPup w/o having to spend a lot of time learning programming and software development
methods. There are suggestions that Java will become a more ubiquitous and easy to use
language than the combination of C, C++ and Perl, which are often used now for various
biocomputing analyses.

SeqApp was started Sept. 1990 as MacApp sequence editor/analysis platform on which
analysis programs from other authors, typically command line w/ weak user interfaces,
could be easily incorporated into a useable Mac interface.

January 1998: version 0.8 release

+ Update to Java version; C++ version no longer updated
+ Bopper2 remote/local interface to command-line applications added. This is based on
CORBA standard. It is experimental; the interface will likely change (improve I hope).
But it has the basic functionality needed to attach local or remote network command-line
style applications to this program. It is user-configurable (with help and better
documentation).
+ 1 Feb 98 - added color picker, background color command, base color prefs dialog,
sequence styles dialog. corrected several pretty print problems.

July 95: Version 0.4 of SeqPup.
This includes most of the features of its ancestor SeqApp. Alignment window: shift &
slide sequences, copy/cut/paste/undo sequence entries among windows; Restriction maps
and pretty print output; useable child apps for mac, mswin, and unix.

v0.4 Known bugs and missing features (see above Bugs section for fuller list):
- Character editing (unlocked text) in the alignment (main) window is not working
on Xwindow systems, and may be bugging in MSWindow and Mac systems.
- Single sequence editor (Sequence/Edit) is very slow for long sequences
(6,000bases)
- Sequence menu items not yet ready : Dot plot.
- Child Apps fail in various ways on MSWindows and Unix systems.
-- CAP seems most likely to succeed completely.
-- ClustalW and FastDNAml may be launched and run properly, but SeqPup will fail
to automatically open their results files.
- MSWindows and XWindows versions are less stable than Mac versions.
- XWindows versions reliable crash/core dump when Quit is chosen. This is an
annoyance but doesn't seem to impair use.
- Internet menu needs testing & reworking - I haven't tested any of the e-mail
services listed since last year.
- Nucleic codes picture shows PICT processing bug -- misplaced text, and an error
in biology -- complement of W is W, not S, and complement of S is S, not W.
- Repeated copy/cut/paste of the alignment window entries might cause problems.
Please let me know if you see this.
- There is no printing for X Window systems.

21 Mar 95: Second release of SeqPup, version 0.1.
This release has more parts of the SeqApp program put into it. This includes some
alignment view manipulations, limited use of child applications, some undo-able
commands, choosing data tables for colors, codon and r.enzymes. This release also
includes much of the basics of GopherPup, including display of RTF, HTML, PICT, GIF
document formats. However there is still some work to be done to let you open these w/o
interpreting them as sequence data.
This release has just a Mac PowerPPC (SeqPup/PPC) and Mac 68000 processor
(SeqPup/68K) versions. When more of the basic bugs are worked out, I'll try Sun and
MSWindows versions.

v0.1 Known bugs/missing features:
- Use of character editing (unlocked text) in the alignment (main) window will lead
to a crash after a few windows have been opened/closed or other manipulations performed.
- File/Open for non-sequence data (text, rtf, etc.) may well mistakenly identify them
as sequence data. File/New is probably not doing anything useful, or bombing.
- Single sequence editor (Sequence/Edit) is very slow for long sequences
(6,000bases)
- Single seq. editor may be failing in various ways (I've not looked at it carefully
yet).
- No cut/copy/paste/undo for align-seq view yet (coming soon I hope).
- Internet menu needs reworking - I haven't tested any of the e-mail services listed
there since last year.
- Sequence menu items not yet ready : Consensus, Pretty print, Restriction Map,
Dot plot, nucleic & amino codes.
- Child apps usage needs more development to work smoothly.
- The Mac/68K version fails when using Child applications.
- Only the ClustalW child app is ready for distribution (may have FastDNAml,
CAP, and DNAml soon -- let me know of programs you would like to see here).

1 Mar 94: First public release of SeqPup, version -1.
It has plenty of bugs and missing features, including:
no Undo (this is a real bite to those used to it)
mostly no cut/copy/paste/clear
limited printing of documents or views
mostly no align-view manipulations (move,cut/copy,edit in place, shift, ...)
no pretty print views
no restriction maps
no dot plots
no ...
problems w/ window display & keeping track of active window (x,mswin)
I'll be adding back many of these features from the Macintosh SeqApp as time permits.

SeqApp 12+ June 93, version 1.9a157+
a semi-major update, and time extension release with various enhancements and
corrections. These include
-- lock/unlock indels (alignment gaps). Useful when sliding bases around
during hand alignment, to keep alignment fixed in some sections.
-- color amino (and nucleic) acids of your choice.
-- added support for more sequence file formats: MSF, PAUP, PIR. SeqApp now relies
on the current Readseq code for sequence reading & writing.
-- save selection option to save subset of bases to file.
-- addition the useful contig assembly program CAP, written by Xiaoqiu Huang.
-- major revision of preference saving method (less buggy, I hope)
-- major revision of the underlying application framework, due to moving from MacApp 2
to MacApp 3.
-- fixed a bug that caused loss of data when alignment with a selection was saved to disk.

5 Oct 92, version 1.8a152+ -- a semi-major update with various enhancements and
corrections. These include
- corrections to the main alignment display,
- improvements to the help system,
- major changes to the sequence print-out options,
-- including addition of a dotplot display (curtesy of DottyPlot),
-- a phylogeny tree display (curtesy of TreeDraw Deck & J. Felsenstein's DrawTree),
-- improved Pretty Print, which now has a single sequence form and a better aligned
sequence form,
-- improved Restriction map display,
- addition and updating of several e-mail service links,
-- including Blast Search and Genbank Fetch via NCBI,
-- BLOCKS, Genmark, and Pythia services,
- updated Internet gopher client (equal to GopherApp),
- editable Child Tasks dialogs
- addition of links to Phylip applications as Child Tasks
- addition of Phylip interleaved format as sequence output option

11 June 92, version 1.6a35 is primarily a bug fix release. Several of the disasterous bugs
have been squashed. This version now works on the Mac SE model, except for sendmail.
No new features have been added.

25Mar92, v1.5a32 (or later). First release to general public. Includes Internet Gopher
client. Also released subset as GopherApp for non-biologists.

4Mar92, v 1.4a38 -- added base sliding in align view. Bases now slide something like
beads on an abucus. Select a section with mouse, then grab section and shift left or right.
Gaps are inserted/removed as needed. For use as contig aligner, still needs equivalent of
GCG GelOverlap to automatically find contig/fragment overlaps.

Also added "Degap" menu item, to remove "." and "-". Fixed several small bugs including
Align pretty print which again should display.

2Mar92, v 1.4a19 -- fixed several annoying bugs, see SeqApp.Help, section on bugs for
their resolution. These include Complement/Reverse/Dna2Rna/ Translation which should
work now in align view; Consensus menu item; entering sequence in align window now
doesn't freeze after 30+ bases; pearson/fasta format reading; ...