Below is a guest post from my friend and colleague Kimmen Sjölander, Prof. at UC Berkeley and phylogenomics guru.

Announcing the FAT-CAT phylogenomic annotation webserver.

FAT-CAT is a new web server for phylogenomic prediction of function and ortholog identification and for taxonomic origin prediction of metagenome sequences based on HMM-based classification of protein sequences to >93K pre-calculated phylogenetic trees in the PhyloFacts database. PhyloFacts is unique among phylogenomic databases in having both broad taxonomic coverage – more than 7.3M proteins from >99K unique taxa across the Tree of Life, including targeted coverage of genomes from Eukaryotes, Bacteria and Archaea -- and integrating functional data on trees for Pfam domains and multi-domain architectures. PhyloFacts trees include functional and annotation data from UniProt (SwissProt and TrEMBL), GO, BioCyc, Pfam, Enzyme Commission and other sources. The FAT-CAT pipeline uses HMMs at all nodes in PhyloFacts trees to classify user sequences to different levels of functional hierarchies, based on the subtree HMM giving the sequence the strongest score. Phylogenetic placements within orthology groups defined on PhyloFacts trees are used to to predict function and to predict orthologs. Sequences from metagenome projects can be classified taxonomically based on the MRCA of the sequences descending from the top-scoring subtree node. Because of the broad taxonomic and functional coverage, FAT-CAT can identify orthologs and predict function for most sequence inputs. We’re working to make FAT-CAT less computationally intensive so that users will be able to upload entire genomes for analysis; in the interim, we limit users to 20 sequence inputs per day. Registered users are given a higher quota (see details online). We’d love to hear from you if you have feature requests or bug reports; please send any to Kimmen Sjölander – kimmen at berkeley dot edu (parse appropriately).

Interesting paper came up in my automated google searches for "phylogenomics": Transitioning Toward a Universal Species Concept for the Classification of all Organisms | InTechOpen. It is by Jim Staley who has been writing a lot about microbial species concepts in the last few years. In addition to trying to bridge the gap between bacteria/archaea and eukaryotes in terms of species concepts. Not sure how I feel about everything in the paper but it has a really nice history of how species have been defined for bacteria. He breaks down this history into four periods

Discovery of microorganisms,

Advent of pure cultures and phenotypic features,

Introduction of molecular analyses and

Gene sequencing and genomics.

And goes through a bit of detail on each one. He also discusses what he sees as a need for a universal species concept and even makes some suggestions about how it might be implemented. Definitely worth a read.

Thursday, February 21, 2013

The following is a joint communication from the Davis Police Department and Davis Joint Unified School District.

On February 21, 2013, at approximately 10:25 am, the Davis Police Department received a call from administrators at Davis Senior High School reporting a male subject possibly armed with a handgun north of the library parking lot. As a precaution, four schools were placed on lockdown: Davis Senior High School; North Davis Elementary School, St. James Elementary School, and King High School. There is no information to indicate that the subject was ever on any of the campuses in the immediate area. The lockdown lasted approximately 35 minutes.

Officers responded and detained a male subject matching the description provided by the reporting party. Through the course of the investigation, it was determined that the subject had been in possession of a BB gun, which he discarded prior to police contacting him.

The BB gun was located in the vicinity of where the male subject was detained.

A second subject, who was with the male, was detained as well. Both individuals were released from the scene. Due to the fact that the subject was never on any of the campuses and also never threatened or brandished the BB gun, no criminal charges are being pursued at this time. The Davis Police Department will continue to investigate this incident for any legal recourse. The male subject is described as being in his late teens and not enrolled at any of the schools.

Anyone with information about this incident is urged to contact the Davis Police Department at 530-747-5400.

The DHS lockdown has been lifted. I want to thank the students, staff and the Davis Police Department for their immediate and professional response during this time. Again, all of our students are safe.Winfred RobersonSuperintendent

Your students are safe. The district received reports that someone was armed in the vicinity of Oak Avenue and B Streets. The police were summoned and the suspects have been detained. We want to assure parents/guardians all safety protocols are in place and all precautions were taken. DHS, North Davis, King High and DSIS schools were on lockdown and have been released. DHS remains on lockdown pending further investigation. Additional information will be forthcoming as it becomes available.Winfred RobersonSuperintendent

They discuss issues like Biodiversity Informatics (see Figure to the left) and evolutionary applications like evolutionary medicine, food production, sustaining biodiversity, computational algorithms, and justice. They also discuss issues like the oncoming onslaught of specimens and the need to link up with museums who have expertise in dealing with such issues. Anyway - it is worth a look. Not the most visionary of pieces ever but it has some concrete suggestions and predictions that will be of use.

Sunday, February 17, 2013

I confess I do not have the time right now to delve into this in detail but this seems of interest: Robust estimation of microbial diversity in theory and in practice. From Bart Haegeman, Jerome Hamelin, John Moriarty, Peter Neal, Jonathan Dushoff and Joshua S. Weitz (full disclosure -I am friends and co-author with some of the authors here).

Abstract: Quantifying diversity is of central importance for the study of struc- ture, function and evolution of microbial communities. The estima- tion of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably es- timate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in compar- ing species richness estimates by applying Chao’s estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics (“Hill diversities”), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao’s estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.

A few years ago I used to post many things for the Web through Apple's Mobile Me service. Annoyingly, Apple ended up treating this like they treat connectors and plugs for their phones and Macs. They just decided to move their online system to iCloud and deleted all the old websites through Mobile Me. Which left me in a lurch. And then I forgot about it. But I have been rediscovering how annoying this is since I had a lot of information out there on old papers and projects and now it is gone from the interwebs. So I have ben trying to re-share all of this stuff.

One way has ben to post data from old papers to Figshare. See for example:

But I also had all sorts of website related material that is annoyingly gone. And yesterday I discovered at least a simple solution to this. I can put all my old websites in my Dropbox public folder and share the link to those files with others and they work pretty well.

See for example my re-releasing of some of my April 1 and other joke websites:

I have always been into sharing scientific information on the web since, well, the web came out. And I am going to dig around for other old websites to post them via Dropbox. If anyone knows an easy way to upload / convert an old website into Wordpress, I suppose I could load in all the old pages into my current wordpress site, but this was a much easier temporary solution. Still annoyed with Apple but glad Dropbox allows a simple solution.

Friday, February 15, 2013

Been reading this paper which I posted about to Twitter recently: A congruent phylogenomic signal places eukaryotes within the Archaea. It is very interesting. Not sure what to make of it though. So - in contrast to my normal ways of putting my ideas out there first and asking for / hoping for comment I thought - let's mix things up. So - I am soliciting comments from people BEFORE I write down my comments. Any ideas / thoughts / comments would be welcome.

Thanks to Russell Neches in my lab I found out about the Earthscapes series stamps from the US Postal Service. Two of the stamps feature microbial ecosystems and I ordered framed, enlarged versions of the photos for my office.

Basically, the paper describes the development and use of a PCR strategy to simultaneously characterize eukaryotic, bacterial and archaeal microbes from samples.

Primers used are summarized in Table 2

The strategy they employ attempt to correct for differences in amplification differences between the different amplicons which should therefore allow better normalization of relative abundance estimates. See results in Figure 2.

Background
A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin’s (Galápagos) finches (Thraupidae, Passeriformes). Their adaptive radiation in the Galápagos archipelago took place in the last 2–3 million years and some of the molecular mechanisms that led to their diversification are now being elucidated. Here we report evolutionary analyses of genome of the large ground finch, Geospiza magnirostris.Results
13,291 protein-coding genes were predicted from a 991.0 Mb G. magnirostris genome assembly. We then defined gene orthology relationships and constructed whole genome alignments between the G. magnirostris and other vertebrate genomes. We estimate that 15% of genomic sequence is functionally constrained between G. magnirostris and zebra finch. Genic evolutionary rate comparisons indicate that similar selective pressures acted along the G. magnirostris and zebra finch lineages suggesting that historical effective population size values have been similar in both lineages. 21 otherwise highly conserved genes were identified that each show evidence for positive selection on amino acid changes in the Darwin's finch lineage. Two of these genes (Igf2r and Pou1f1) have been implicated in beak morphology changes in Darwin’s finches. Five of 47 genes showing evidence of positive selection in early passerine evolution have cilia related functions, and may be examples of adaptively evolving reproductive proteins.Conclusions
These results provide insights into past evolutionary processes that have shaped G. magnirostris genes and its genome, and provide the necessary foundation upon which to build population genomics resources that will shed light on more contemporaneous adaptive and non-adaptive processes that have contributed to the evolution of the Darwin’s finches.

Figure 1

There is a long long long story behind this paper. Too long for me to write up right now. I wrote up some of the story for a Figshare posting of the genome data last year.

“Darwin’s Finches” are a model system for the study of various aspects of evolution and development. In 2008 we commenced on a project to sequence the genomes of some of these species – inspired by the (then) upcoming celebration of the 200th anniversary of the birth of Charles Darwin (which was in February 2009). The project started with a brief discussion at the AGBT meeting in 2008 and then via an email conversation between Jonathan Eisen and Jason Affourtit about the possibility of a collaboration involving the 454 company (which was looking for projects to highlight the power of it’s then relatively new 454 sequencing machines). After further discussions between Jonathan Eisen, his brother Michael Eisen (who separately had become interested in Darwin’s finches) and people from 454 it was decided that this was a potentially good project for a scientific and marketing collaboration.

In these conversations it was determined that the most likely limiting factor would be access to DNA from the finches. This was largely an issue due to the fact that the Galapagos Islands (where the finches reside) are a National Park in Ecuador and also a World Heritage site. Collection of samples there for any type of research is highly regulated. Thus, Jonathan Eisen made contact with Peter and Rosemary Grant – the most prominent researchers working on the finches – and who Eisen had discussed sequencing the finch genomes in the early 2000s. In that previous conversation it was determined that the sequencing would be too expensive to carry out without a major fundraising effort. However, with the advent of “next generation” sequencing methods such as 454 the total costs of such a project would be much lower.

In the conversations with the Grants, the Grants offered to ask around to see if anyone had sufficient amounts of DNA (or access to samples), which would be needed for genome library construction. Subsequently they identified Arkhat Abzhanov from Harvard as someone who likely had samples as well as permission to do DNA-based work on them, from many of the finch species. Abzhanov offered to provide samples from three key species (large ground finch Geospiza magnirostris, large cactus finch G. conirostris and sharp-billed finch G. difficilis) and DNA was sent to Roche-454 for sequencing in July of 2008. In August, the first “test” sequence data was provided from Geospiza magnirostris. A plan was then made to generate additional data and Roche offered to do the sequencing at their center at a steep discount. Funds were raised by Jonathan Eisen, Greg Wray, Monica Riley, and others to pay for the sequencing and over the next year or so, three sequencing bursts were conducted at Roche-454. "

That is a decent summary of the background. The details on the science are in the paper. What the background does not say is that the project languished for years as we did not have funds to support the actual analysis of the genomes and it was kind of out of my normal area of expertise. Along the way, I did a poor job of communicating with some of the initial parties in the project (e.g., I did a really bad job of communicating with Greg Wray - who had provide some of the funds - and I will forever be trying to make things up to him). Anyway, thankfully Arhat eventually pulled together a group of people led by Chris Ponting to help analyze the genome and Chris led the way to the paper that is out today. Only four years after our original goal.

I have been a birder and an evolutionary biologist for many many many years. Thus this is kind of a cool project for me. When I was in the Galapagos in 2002 I dreamed of doing a project like this - and even started doodling Darwin's finches all over the place - including on some of the styrofoam cups we sent down to the bottom of the ocean on the outside of the Alvin sub as part of a deep sea research cruise I went on. See below: