The NEXUS Class Library (NCL) is an integrated collection of C++ classes designed
to allow the user to quickly write a program that reads NEXUS-formatted data
files. It also allows easy extension of the NEXUS format to include new blocks
of your own design.

A word about the intended audience is in order before we get too far along
(no need to waste your time if the NCL will not be helpful to you). The intended
audience for both this documentation and the accompanying class library comprises
computer programmers who wish to endow their C++ programs with the ability to
read NEXUS data files. If you are not a programmer and simply use NEXUS files
as a means of inputting data to the programs you use for analyzing your data,
the NCL is not something that will be useful to you. The NCL is also not for
you if you are a programmer but do not use the C++ language, since the NCL depends
heavily on the object oriented programming features built into C++. There is
no Java version of the NCL, nor is one planned. This is simply a reflection
of the fact that I primarily program in C++ and only have time to write the
library once.

The NEXUS data file format was specified in the publication cited below. Please
read this paper for further information about the format specification itself;
the documentation for the NCL does not attempt to explain the structure of a
NEXUS data file.

The basic goal of the NCL is to provide a relatively easy way to endow a C++
program with the ability to read NEXUS data files. The steps necessary to use
the NCL to create a bare-bones program that can read a NEXUS data file are simple
and few (see the section entitled Building a NEXUS File Reader
below), and it is hoped that the availability of this class library will encourage
the use of the NEXUS format. This will in turn encourage consistency in how
programs read NEXUS files and how programs respond to errors in data files.

There are a large number of special data file formats in use. This places an
extra burden on the end user, who must deal with an increasing number of file
formats all differing in a number of ways. To convert one's data file to another
file format often involves manual manipulation of the data, an activity that
is inherently dangerous and probably has resulted in the corruption of many
data files. At the very least, the large number of formats in existance has
led to a proliferation of data file variants. With many copies of a given data
file on a hard disk, each formatted differently for various analysis programs,
it becomes very easy to change one (say, correct a datum found to be in error)
and then fail to correct the other versions. The NEXUS file format provides
a means for keeping one master copy of the data and using it with several programs
without modification. The NCL provides a means for encouraging programmers to
use the NEXUS file format in future programs they write.

The NCL has been designed to be as portable as possible for a C++ class library.
The NCL does make use of the ANSI Standard C++ Library (STL), but use
of the STL is now common and should not cause problems for modern compilers/platforms.

I have attempted to create the NCL in such a way that one is not limited in
the type of platform targeted. For example, NEXUS files can contain "output
comments" that are supposed to be displayed in the output of the program reading
the NEXUS file. Such comments are handled automatically by the NCL, and are
sent to a virtual function that can be overridden by you in a derived class.
This provides a means for you to tailor the output of such comments to the platform
of your choice. For example, if you are writing a standard Linux console application
(i.e., not a graphical X-Windows application), you might want such output comments
to simply be sent to standard output or to an ofstream object. For a graphical
Windows, MacIntosh or X-Windows application, you might deem it more user-friendly
to pop up a message box with the output comment as the message. This would ensure
that the user noticed the output comment. You also have the option of having
your program completely ignore such comments in the data file.

The NCL provides similar hooks for noting the progress in reading the data
file. For example, the virtual function EnteringBlock is called and
provided with the name of the block about to be read. You can override EnteringBlock
in your derived class to allow, for example, a message to be displayed in a
status bar at the bottom of your program's main window (in a graphical application)
indicating which block is currently being read. Other such virtual functions
include SkippingBlock (to allow users to be warned that your program
is ignoring a block in the data file), SkippingCommand (to allow users
to be warned about particular commands being skipped within a block), and NexusError ,
which is the function called whenever anything unexpected happens when reading
the file.

The basic tools provided in the NCL allow you to create your own NEXUS blocks
and use them in your program. This makes it easy to define a private block to
contain commands that only your program recognizes, allowing your users to run
your program in batch mode (see the section below entitled
General Advice for more information on this topic).

The main current limitation is that the NCL is incomplete. Some standard NEXUS
blocks have been provided with this distribution, but because the NEXUS format
is so extensive, even some of the standard blocks described in the paper cited
above have not been implemented (or have been only incompletely implemented).
Here is a summary table showing what has been implemented thus far:

Block

Current Limitations

ASSUMPTIONS

Only TAXSETS, CHARSETS, and EXSETS have been implemented
thus far.

ALLELES

Cannot yet handle transposed MATRIX, and only DATAPOINT=STANDARD
is implemented.

CHARACTERS

Only ITEMS=STATES and STATESFORMAT=STATESPRESENT has been
implemented thus far, and DATATYPE=CONTINUOUS has not been implemented.

DISTANCES

No limitations, completely implemented

DATA

Since the DATA block is essentially the same as a CHARACTERS
block, the same limitations apply.

TAXA

No limitations, completely implemented

TREES

No limitations, completely implemented

While the limitations for the CHARACTERS block may seem a bit extreme, this
block is nevertheless implemented to the point where almost all existing morphological
and molecular data sets can be read.

The ALLELES block has not yet been used in any program to my knowledge. It
is very similar to the GDADATA block used in my program GDA, but differs in
requiring NEWPOPS to be specified if a TAXA block does not precede the ALLELES
block (this is to make the ALLELES block more like the CHARACTERS block.

Some recent modifications of the NEXUS format implemented in Mesquite
(e.g. LINK statements in the CHARACTERS block) and MrBayes
(e.g. DATATYPE=MIXED in the DATA block) are not supported in the NCL at this
time.

The NCL has been designed to be portable, easy-to-use, and informative in the
error messages produced. It will be apparent to anyone who looks very closely
at the code that some efficiency (both in executable size and speed) has been
sacrificed to meet these goals.

This section illustrates how you could build a simple NEXUS file reader application
capable of reading in a TAXA and a TREES block. Note that the file nclsimplest.cpp
contains all of the code for this example. To keep things simple, we will just
write output to an ofstream object (nothing graphical here).

Creating block objects

The first two lines of the main function involve the creation of objects corresponding
to the two types of NEXUS blocks we want our program to recognize. NxsTaxaBlock
is declared in the header file nxstaxablock.h
and defined in the source code file nxstaxablock.cpp,
whereas the NxsTreesBlock class is declared in nxstreesblock.h
and defined in nxstreesblock.cpp. Note that the
NxsTreesBlock constructor requires a reference to an object of type NxsTaxaBlock.
This is because the taxa labels in a TREES block should correspond to any taxa
previously defined in a TAXA block. If no TAXA block precedes the TREES block,
taxon labels defined in the TREES block will be used to populate the TAXA block.
In the NCL, any block that defines taxon labels stores this information in the
NxsTaxaBlock object, and any block that needs such information requires a reference
to the NxsTaxaBlock object in its constructor.

Adding the block objects to the NxsReader object

The next three lines involve creating a NxsReader object and adding our two
block objects to a linked list maintained by the NxsReader object. The MyNexusReader
class is derived from the NxsReader class, which is declared in nxsreader.h
and defined in nxsreader.cpp. Although a NxsReader
object can be created and used, you will probably wish to derive a class from
it (as I did in this example) and override some of the NxsReader virtual functions,
such as EnteringBlock , SkippingBlock , and NexusError
(the NxsReader version of these functions does nothing, and it is important
to at least report errors in some way to your program's users).

The reason the NxsReader object must maintain a list of block ojects is so
that it can figure out which one is responsible for reading each block found
in the data file. The block objects taxa and
trees have each inherited an id
variable of type char * that stores their
block name (i.e., "TAXA" for the TaxaBlock and "TREES" for the TreesBlock).
When the Execute member function encounters a block name, it searches
its linked list of block objects until it finds one whose id
variable is identical to the name of the block encountered. It then calls the
Read function of that block object to do the work of reading the block
from the data file and storing its contents. It is possible of course that a
block name will appear in a data file for which there is no corresponding block
object. In this case, the Execute method calls the SkippingBlock
method to report the fact that it is skipping over the contents of the unknown
block.

Reading the data file

The next two lines create a token object (MyToken is derived from the NxsToken
class), and initiate the reading of the NEXUS data file using the Execute
function. The input and output files are created within the MyNexusReader class.
While this is not required, it facilitates handling messages generated while
the data file is being read. The NxsToken class has one virtual member function
( OutputComment ) which enables you to control how output comments are
displayed. The NxsToken version of OutputComment does nothing, so
you must derive your own token class from NxsToken and override the OutputComment
method in order for the output comments in the data file to be displayed. The
main function of the NxsToken class is to provide a means for grabbing separate
NEXUS tokens (words separated by blank spaces or punctuation) one by one from
the data file. Calling the GetNextToken function reads and stores
the next token found in the data file, correctly handling any comments found
along the way. This automatic comment handling greatly simplifies reading a
NEXUS data file.

Reporting on block objects' contents

The last two lines call the Report functions of each of the blocks.
This just spits out a summary of any data contained in these objects that has
been read from the data file.

Note that the ifstream is opened in binary
mode. You should always open your input file in binary mode so that the file
can be read properly regardless of the platform on which it was created. For
example, suppose someone created a NEXUS data file on a MacIntosh and wanted
to read it with your program, which is running on a Windows XP machine. Opening
the file in binary mode allows the NxsToken object you are using to recognize
the newline character in the Mac file as such, even though MacIntosh computers
use a different symbol (ASCII 13) to represent the newline character than computers
running Windows (which use the ASCII 13, ASCII 10 combination for newlines).

We derive our own token reader from the NxsToken class in order to display the
output comments present in the data file (if any). The virtual function OutputComment
in the base class is overridden to accomplish this.

Here is the entire program. Note that in order for this to link properly, you
will need to also compile the following files included with the NCL (and instruct
your linker to link them into your main executable): nxsblock.cpp,
nxsexception.cpp, nxsreader.cpp,
nxsstring.cpp, nxstaxablock.cpp,
nxstreesblock.cpp and nxstoken.cpp.

Here is a sample data file that exercises a lot of the features of the NEXUS
file reader we have just created. First, there are both output and regular comments
scattered around. Some are between tokens, some occur at the beginning of a
token, and still others begin right after a token. Some comments even have nested
within them words surrounded by square brackets. There are also blocks in this
data file (i.e., the paup block) that are not recognized by the NEXUS file reader
we have created. The NEXUS reader handles all of these situations with very
minimal effort on your part. Note that you can remove the TAXA block without
ill effects because the taxon labels are specified in the TREES block.

Creating your own NEXUS block involves deriving a class from the NxsBlock base
class and overriding the three virtual functions Read , Reset ,
and Report . Use the files emptyblock.cpp
and emptyblock.h as templates for your own source
code and header files. While creating your own block class is not a complicated
endeavor, here are some things to watch out for:

Be sure to write the Reset function in such a way that all heap
memory is cleaned up (deleted). This means whenever you use the new
keyword to allocate memory for an object that will potentially persist until
the Reset method is called, you need to put code in the Reset
function to delete it. Also, it is important to delete such objects in the
destructor for the class as well.

When writing the Read function, put assert
macros everywhere you make an assumption, however insignificant this assumption
seems at the time of writing. This tremendously speeds up the task of finding
bugs when one of your assumptions turns out to not always be true!

Such checks will give your users some hope of finding where they have made
a mistake in constructing their data file. We all know how frustrating it
can be to have a program exit with an uninformative error message.

A typical program making use of this library might have the following two general
characteristics:

The program is structured so that it can be compiled either as a command-line
driven, console program or as a graphical application (with a Windows, MacIntosh,
or XWindows Graphical User Interface, or GUI)

Handles a private NEXUS block that contains commands identical to those
that can be entered on the command line, enabling the program to be run in
batch mode

After developing several programs like this, I have come up with the following
strategy that makes efficient use of the object-oriented nature of the NCL.
I will assume your non-graphical program will be called simply "Phylome" and
will read a private NEXUS block named "PHYLOME". I will further assume that
the GUI version will be targeted for the Windows platform, and will be colled
"PhylomeWin".

Create a class (call it PhylomeBase) that performs all the (non-graphical)
core functionality of the program. Be careful not to place any GUI code in
this class. This class should be derived (publicly) from both
the NxsBlock class (this class will encapsulate your program's private NEXUS
block) and the NxsReader class (which encapsulates your program's NEXUS file
reading capability).

In class PhylomeBase, override the NxsBlock virtual functions Read ,
Reset and Report and any other handler functions needed
to process the commands in the private block.

In the PhylomeBase constructor, create all the NxsBlock-derived objects
needed (i.e., a NxsCharactersBlock object, a NxsTaxaBlock object, etc.) and
add these to PhylomeBase using the Add function. Don't forget to
Add(this) too so that your program can read the private block. Note that "this"
is both a NxsReader object as well as a NxsBlock object, so it is able to
add itself to the list of blocks processed upon execution of a NEXUS data
file.

Now create a class PhylomeWin that handles the Windows GUI interface, deriving
publicly from PhylomeBase. In class PhylomeWin, override the NxsReader virtual
functions EnteringBlock , SkippingBlock , and NexusError .
For instance, within NexusError , you might put up a MessageBox
telling the user about an error encountered in the data file.

For the command-driven version, create a class Phylome that is derived
from PhylomeBase, overrides the NxsReader and NxsBlock virtual functions (like
its GUI counterpart, PhylomeWin), and provides a means for users to enter
commands (you should create a mechanism in PhylomeBase so that you can use
the same machinery to process commands typed in from the keyboard as you use
to process commands contained in a private block). The basiccmdline.h
and basiccmdline.cpp files are provided as an
example of how to create a basic console program with an interactive command
line interface.

Please report bugs by email directly to Paul O. Lewis.
Please include "NCL bug" in the subject line to ensure that my mail filter catches it.
Your bug has a better chance of getting fixed if you can attach to your email a NEXUS data file that
causes the problem you have noticed.

Although you are not obligated in any way to me as a result of using this library
to create programs, there are a few things that you can do to help encourage
me to continue improving this library. Please make use of any of the following
means of support that you feel comfortable with:

Cite the NCL! If you publish an announcement of a program of yours that
includes the NCL, please acknowledge that your program includes the NCL.

Advertise the NCL! If your program produces output either on the screen
or in the form of a file, mention that your program uses the NCL in your program's
output.

Help me fix bugs. If you discover a bug, please let me know about it so that I can
get it fixed. Scroll up to the section entitled "Reporting Bugs" for the (simple)
instructions for reporting a bug.

Give me suggestions. I welcome suggestions for improving the library and
making it more convenient to use. I can't guarantee that I will have time
to honor all requests, but I will try my best.

The current capabilities of the NCL are best illustrated by taking a look at
some of the data files that it can successfully read. For example, the NCL can
successfully read data files available on the "Green
Plant Phylogeny Research Coordination Group" website, which have a reasonably
compicated structure.Included with the NCL are data files containing multiple
DISTANCES blocks (distances.nex) or multiple CHARACTERS
blocks (characters.nex) to illustrate some of
the formatting options available with these two NEXUS block types. These examples
amply demonstrate the capabilities of the NCL as it now stands, however the
NCL will continue to grow as code for recognizing more and more NEXUS blocks
are added. I welcome both suggestions for improvement as well as bug reports,
of course.