This collection of tools is a subset of the tools, applications,
and libraries developed within the
Center for Quantitative Biology at
Princeton University's
Lewis-Sigler Institute.
They are provided here within a common framework, called the
CQB Integrated
Tools, which allows for data to be processed and passed between
them easily. The Integrated Tools framework also supports batch processing
for multiple data sets natively within each tool and utilizes the
local computing grid for high throughput.

This set is not limited to tools within the Center. We are
anticipating adding several more tools from within the Center as well as
other widely utilized biological applications from external sources.

Clustering

The tools in this category have to do with clustering data or handling the files that are associated with clustering.

Iclust

Iclust is an information-theoretic, model-independent clustering
application. It can be applied to many different kinds of data. It
typically starts by producing a mutual information pairwise relations
matrix based on the input data. Iclust then uses this matrix to group
the input data into separate clusters. The theory along with several
examples are described in Slonim et
al., PNAS, 2005. More...

KNNImputer takes a PCL file, imputes missing values, and saves the
result as a new PCL file. It does this by examining the nearest
neighbors (the number of which is adjustable) with one of several
different distance measures. Genes with more than 30% (by default)
missing data will be deleted rather than imputed. See KNNImpute's home page and Troyanskaya et al., Bioinformatics, 17:520-5,
2001. KNNImpute is available as part of the Sleipnir
library.

PruneTree

This tool takes a CDT file and associated GTR file, then traverses
the tree and prunes it where it finds the correlation exceeds the
given threshold. It then outputs the contents of the pruned parts of
the tree. The output in partition file format is a single file
listing each identifier along with a partition number corresponding to
the group it belongs to. This file is suitable for use with FIRE and
is similar to that produced by Iclust. The node file format
outputs two files for each group, one with a list of the identifiers
in the group, and one that is a CDT file containing those members. More...

Motif discovery

The tools in this category have to do with motif discovery and characterization.

COALESCE

This Combinatorial Algorithm for Expression and Sequence-based Cluster
Extraction (COALESCE) can use large collections of genomic data and
Bayesian integration to predict coregulated gene modules, the
conditions of regulation, and the consensus binding motifs for
regulation. It uses a synthesis of gene expression biclustering, motif
prediction, and data integration (including expression, sequence,
nucleosome positioning, and evolutionary conservation). Input data is
PCL format. This tool is available with the Sleipnir library.

Annotation/Enrichment

The tools in this category search for annotations to enrich the
input data or deal with annotation data.

GAFview

The GAF Viewer displays the data within annotation files, such as
those provided by the GO consortium, and other formats. It produces
two types of tables - one showing all the identifiers and what
ontology terms they are directly annotated to, and one showing all the
referenced ontology terms and what identifiers are annotated to them
(both directly and indirectly). The tables also include the organism
(by taxon ID), evidences, references, and alternate identifiers
(synonyms). In addition, a DAG can be produced showing the structure
of the ontology and what identifiers are directly annotated to
them.

Data conversion

The simple tools in this category can be used for basic
conversions and filtering operations - altering the format, converting
data types, and so on.

Map identifiers

This tool takes a delimited text file and maps identifiers in selected
columns from one type to another, for example from Agilent IDs to
Yeast ORFs. More...

Data matrix extractor

DME takes a delimited text file, such as a PCL file, and extracts the embedded matrix.
In the case of PCL files, for example, the matrix is the experiment data. This is done
by locating the largest body of numeric data within the text, and in known cases excluding
certain areas (GWEIGHT columns and EWEIGHT rows, for example). More...

This image illustrates extracting a matrix from a PCL file.

Delimited text converter

This tool takes a delimited text file (where each line has fields separated by a special character or characters) and performs some basic conversions. One common conversion is the changing of the delimiter, for example from a tab to a space. Other common conversions are removing blank space at the beginning or end of the lines, replacing missing (empty) fields, removing blank lines, and so on. More...

Visualization

This category contains tools that help to enable visualization of data.

Heatmap generator

Given a file containing a numeric matrix, a heatmap is
generated. Each element in the heatmap is colored depending on the
magnitude of the corresponding matrix element relative to the "center"
value (usually the mean). There are several options to control the
coloring. More...