OPENSEQ (short for open sequence) was created with the intent of providing an open platform to share sequence based analysis to the public.

Our first project includes constructing a maximum entropy statistical model to describe each protein familiy in the Pfam database. The model captures both conservation and co-evolution patterns in the family. The strength of measured co-evolution is strongly predictive of residue-residue contacts in the 3D structure of the protein. This means that we can use this information to better understand contacts in known structures and make prediction of contacts for protein families that have no structure. (Image on the left shows the ND1 protein of the electron transport chain with predicted contacts in yellow).

We've added support for Jackhmmer on our complexes web-server. You can see the technical details of how exactly we implemented it to do paired-alignment generation in the techincail details of our FAQ. The advantages/disadvantages of using Jackhmmer are described April 04, 2014 (below).

Since our publication: a similar, independent and very cool study has recently been posted to bioRxiv by Hopt et al (folks at evfold.org)

We also have a beta webserver that will generate paired alignments for a given protein pair and run the GREMLIN analysis. If you encounter any error or have suggestions, please use our contact form!

CASP11 starts TODAY!!

CASP11 is going to be the era of "contact prediction" every group has some kind of contact predictor in their pipeline. To make things easier for folks interested in using GREMLIN results in CASP, casp sequences submitted will automatically be organized in our CASP11 page. Good luck!!

Disadvantge: It is slower than HHblit, since Jackhmmer does not use the pre-clustered uniprot database.

Advantage: Since Jackhmmer does not require a pre-clustered uniprot DB (which is updated once a year), it can use the latest uniprot release (which is updated once a month).

If your favorite gene does not have enough homologous sequences to perform co-evolution analysis, you can try resubmitting the gene every month until it does! =P

For those of you interested in using co-evolution based contacts in Rosetta modeling software, restraint files are now provided for each submission. We also provide a realignment and renumbering tool, for those using the restraints for a sequence longer or shorter than the original query.

February 16, 2014

Some exciting new features have been added to the submissions output page! All previous submissions have been updated.

We've been working on adding homooligomer support. When you submit a job, contacts coming from other chains will now be highlighted in shades of red. The max hhsearch contact will now be shown instead of the average (being either an intra, inter chain contact, or coming from a different pdb hit.)

Each PDB hit now has its own contact map, that you can click on.

We ran GREMLIN on all E. coli genes that have at least 1L sequences. I am working on creating a pretty intro page, but in the meantime you can see a sneak preview here: ECOLI

September 14, 2013

It's very exciting to see folks submitting jobs to the server (We apologize to those that submitted sequences that are ~1000 amino acids, those can take up to 24 hours to complete...). We've been fixing little bugs as they come up, if you encounter any errors or the output of the results page does not make sense, please write to us!

We now include a sequence conservation graphic as depicted by WebLogo, for all our submissions. These can be useful because:

A mutation to a functionally important residue will not be selected for, if it cannot be easily compensated by a co-mutation, and thus will not be observed in a multiple sequence alignment, hence not be captured by a co-evolution analysis.

What may appear to be an highly variable/un-conserved position, (based on a WebLogo representation), may actually be highly conserved and co-evolving with another position [as would be captured in the GREMLIN output].

September 3, 2013

We are excited to announce that the paper describing our latest work has been released in PNAS! [LINK][PDF]

A simplified version of the webpage is now available at gremlin.bakerlab.org that will only include the pfam analysis and gremlin submission form. This is to prevent information overload for folks accessing the resource for the first time, as we continue to add other resources to openseq.org.

August 23, 2013

The online server has been updated to include more options, and to make resubmission process much easier. Options include:

The ability to submit either a single sequence or a starting alignment.

Control diversity of the alignment, by adjusting number of iterations and e-value.

Focus on region of interest, by adjusting the coverage and gap removal filters.

We are working on adding the ability to select priors! Right now only the "Vanilla" option works.

The FAQ page is now live! Its hard to judge what is common knowledge and what is new to our users. Please help improve this page by submitting questions using our contact form! (Even if its a question to which you already know the answer to, but you feel others might benefit.)

August 5, 2013

We are working on setting up an online server for GREMLIN co-evolution analysis. The server is in BETA mode, any suggestions are welcome as we prep for public release!

July 30, 2013

Predictions for 2013 should be done. We are keeping the 2012 predictions for archive purposes.

We are in the process of uploading alignments used in our calculations. Note: for gremlin runs we removed sites that had > 75% gaps, provided alignment includes these sites.

July 24, 2013

We are updating our predictions to reflect new sequences that have been released since 2012. When you click on any of the pfams, you'll see a "2013" tab. The calculations are running and will be uploaded as they come in. Eventually the lists will be replaced with these new calculations.

We welcome any suggestions as we prep this webpage for public release.