CENTIPEDE applies a hierarchical Bayesian mixture model to infer regions of the genome that are bound by particular transcription factors. It starts by identifying a set of candidate binding sites (e.g., sites that match a certain position weight matrix (PWM)), and then aims to classify the sites according to whether each site is bound or not bound by a TF. CENTIPEDE is an unsupervised learning algorithm that discriminates between two different types of motif instances using as much relevant information as possible.

If the automatic installation from R-forge does not work the package can be manually downloaded from here.

Transcription factor map for lymphoblastoid cell-lines

Here we report the map that we generated for our paper [1]. The raw sequencing reads can be accessed in GEO for two of the LCL lines we generated DNase-I data (GSE25341), and for one additional ENCODE cell-line (GM12878, GSE19622) generated by the Crawford group.

This work has been supported by grants from the National Institutes
of Health, by the Howard Hughes Medical Institute,
by the Chicago Fellows Program, by the American Heart
Association, and by the NIH Genetics and Regulation
Training grant.

We also thank the
ENCODE Project, supported by NHGRI,
for making data available pre-publication (in particular the
Bernstein, Crawford, Myers and Snyder groups and
the UCSC Genome Browser)