about my research: gene position and selective constraints

It is time I introduce a bit the research I am doing for my PhD, here at the Pompeu Fabra-CSIC university 🙂

The main area of our research is to study whether there is correlation between the position of a gene within a biological pathway and the strength of selective constraints on it. So, for example, if genes involved in a high number of interactions and functions tend to be more conserved (==see less changes) among species, or not. This can be better explained with this figure for a terrible poster I presented in the workshop for Evolutionary Systems Biology last year:

In this hypothetical biological pathway, genes in upstream positions or with an high number of interaction are more functionally constrained than the others, therefore their sequence should be more conserved.

The figure represents an ideal pathway of genes, as the ones annotated in the KEGG, MetaCyc or Reactome. All the nodes are genes, and the edges represent any kind of interaction between two genes: for the general discussion it is not necessary to specify whether they are metabolic, physical or other kind of interactions.

The intensity of the colors in the figure represent the strength of selective constraints we expect to find on each node. The gene on the most upstream position should be the one with the strongest selective constraints, because, if a mutation introduces a loss of function there, all the downstream interactions will be compromised. A similar reasoning can be made for genes with an high number of interactions, which should be strongly conserved.

The first issue we faced has been identifying the best model pathways to study these hypothesis. The annotations on databases like KEGG and Reactome are good, but they are also full of potential false positives and small errors; so, I think that a large scale analysis on the “whole interactome” won’t produce anything really significant. So, we had to look for pathways with good annotations and structure simple enough to be able to easily associate positions to each node of the pathway.

Another difficulty is to choose the correct ‘node centrality’ measures to correlate with the selective constraints. Apart from position and degree, there are a lot of other measures that can be interesting, as the centralities I described in a previous post.

The third difficulty is the data to use for this analysis. When you compare sequences of the same gene among different organism, you have to correctly identify which gene exactly correspond to the others in each organism (homology), and this may take some time. On the other hand, carrying out this analysis on intra-specific data may be more difficult, because often the only data available is composed by genotyping panels, which are less informative that full sequences.

The most recent paper we published on this topic is from my colleague Ludovica Montanucci , who has found that genes with an higher number of connections in the N-Glycosylation pathway tend to evolve slower, at least among great primates. We are going to submit more manuscript in short time.

Hello Moreno! Thank you, I didn’t know the two articles. The Network of Cancer Genes seems to be a nice resource, and is a candidate for being a data source for a study. It would be interesting to see if there is correlation between the duplicability of a gene and its conservation among species or human populations. I need to study this better…

There’s also a new paper from the same group coming out soon, I will keep you updated! Also, if I manage to obtain the slides of the seminar I will send them to you. Happy to have given some interesting reading material.