I’ve had the good fortune of having some papers published recently. The first one is a methodology paper concerning a way of extracting phylogenetic information from regions of multiple sequence alignments that are full of indels and difficult to align:

In molecular biology, an alignment is a partial reconstruction of the evolutionary history of a group of sequences. In an alignment, all residues found in the same column are considered to be descended from a single residue in the ancestral sequence. (Of course, insertions violate this description, but I won’t get into that.) Alignments are not direct observations. They are actually inferences based on the patterns of sequences found in the dataset. Often times there are particular areas in which the alignment is difficult to resolve. Take this example:

A typical problem in multiple sequence alignments where a section is full of gaps and contains a complicated phylogenetic signal. Dark red: high certainty that alignment is accurate; Dark blue: low certainty that alignment is accurate..

It was constructed via the GUIDANCE webserver. (A great resource that everyone should e.) In this example, we have a region defined by a lot of sequence variation created by many insertions and deletions. The alignment is not well defined here, and in most applications it will jt be removed, and the data “thrown away”.

But is this the only solution? In our paper we develop a methodology, dubbed PICS-Ord (download), that provides an easy solution for extracting phylogenetic information from problematic regions chosen by its er. PICS-Ord works through a three-step process:

Realign the segments in pairs ing Ngila, and calculate the likelihood of the alignment from an evolutionary model. This produces a distance matrix of the segments.

This might seem a bit odd at first. “Why not jt e the distance matrix directly?” That would be great if we could, but there aren’t any phylogenetic programs that we know off that allow the mixing of distance matrices and sequence data. With our method, we get discrete, ordered characters that can be ed in popular programs like, RAxML.

There are three example files in the PICS-Ord distribution, and I’ll illtrate its age with example1.fas. The alignment of these sequence fragments is messy:

I haven’t been keeping up with my Calix Cari polls this year for collage football. But now that the regular season has ended, I have found time to produce one. The events of this season have been rather unpredictable. Of course by the end of the season there were few surprises. Auburn appeared out of nowhere to become #1 on the strength of a once-in-a-decade player who fit perfectly into their offensive system. (Yay, for the booster who had cash to spare in this economy. We will see if their season stands the test of time.) But in my calculation Auburn is only #3, behind Oregon and ¡Stanford! (I still think Harbaugh would make a smooth transition into the coach’s chair at UGA but wouldn’t be there long. It’s good that Richt was retained.)

It appears my algorithm likes the Pac-10 over the SEC, and Auburn lost ground becae of its tight victories, early in the season.

I apologize for things being slow on this blog. I’ve been knee deep in programming, mancripts, grant proposals, and teaching. I’m hoping to have results to share in the near future. In the mean time, you can follow some of my activities on the Panda’s Thumb.

I will say that the development version of Dawg now supports codon models, and Ngila has some new features as well.

This machine got partially hacked over the weekend. From what I can tell, Ziproxy was compromised and ed to submit spam email through my system. Becae my mail sever accepts local email, it was going out. It looks like only yahoo emails were being hit. Of course, the spam was coming from China.

Since I turned off Ziproxy, I haven’t seen any odd email originating from my machine.

We’ve been having network issues for the last week. (The network setup in general is not reliable.) Nothing is wrong with what I have control over, but I’m kind of stuck with the connection that I’ve been given.

With the conference championship games complete my Calix Cari rankings once again agree with the pollsters: its Texas vers Alabama for the title.

Surprisingly, despite Tebow crying after visiting the wood-shed in Atlanta Florida still ranks #3, ahead of undefeated teams TCU, Cincinnati, and Boise State. Two loss Oregon comes in at #4, based on its strong showings against Cal, C, Arizona, and Oregon State. Boise State produced the second best victory of the year (against Oregon) but couldn’t overcome a week conference schedule to rise higher than #9.

The Pac-10 placed 6 teams in the top 25, followed by Big 12 at 4, SEC, Big East, ACC, and Big 10 at 3, MWC at 2, and WAC at 1. Clearly the Pac-10’s 8-4 logjam was seen more highly than the SEC’s 7-5 logjam.

This is my first poll of the season, and Alabama is ranked #1, followed by Iowa and Florida. What’s interesting is that a quick look through the results reveals that Boise State’s victory of Oregon is considered the best win of any team of the year.

I’m moving to Hoton next month, and I’m working furioly to finish up about 4 papers before I leave. Yesterday was a good day becae two things worked absolutely perfectly.

I’m calculating the likelihoods of some of our observations and I wrote a specific routine to do it for a first-order model. I then wrote a generic routine to calculated likelihoods for higher-order models. When I ed the generic routine to calculate the likelihood for a first-order model, I got the same result as my specialized routine. Yatah!

From previo analysis ing partial autocorrelations, we determined that a third-order model should explain our data the best. When I compared our models to see which one was most parsimonio (ing AIC), the third-order model again came out on top. Yatah2!