I am currently in my honours year of a computing science degree and for my final year project doing some research into Bioinformatic Information Visualisations. The project is primarily concerned with graph layout, specifically relating to bioinformatics.

I'm a total novice to the field so I'm looking for some advice.

I'm currently considering experimenting with the layout of protein interaction networks, can anyone recommend a good web source for total beginners where I can get some background into exactly what these diagrams represent.

I would also be greatly indebted to you if could give me some advice as to what kind of information bioinformaticians look to gather from protein interaction network diagrams, i.e. what questions can a diagram be used to answer.

You might read their cited literature to get a better handle on this stuff. The major network annotation that most serious analysts employ is offered by Ingenuity, but they make you pay for it. KEGG also has basic regulatory network information, and it's free:

I'm making some progress with my project and I've been informed that one concern Bioinformaticians have when analysing protein-protein interaction networks is identifying small 'clusters' of interacting proteins.

Can anybody point me in the right direction as to what the significance of a small 'clusters' of interacting protein is, if there are any methods currently used to identify them?

A small group of proteins with many interactions between them could represent many things. In a protein interaction network, it probably represents members of a large multimeric complex, like the DNA replicating machinery or the ribosome. In metabolic and regulatory networks, I'm not so sure.

Anyway, one way to discover these things is to ask questions about the so-called clustering coefficient of each node in the graph. Suppose that you have a node A connected to nodes B, C, D, and E. Suppose further that there is a connection between B and D and between C and E. The clustering coefficient for a node is defined by looking at the node's neighbors and asking "how many connections exist between these neighbors?" then dividing that by the total number of connections between those neighbors. In our example, there are two connections between A's neighbor nodes, and with four neighbor nodes (B, C, D, and E) there is a total of six possible connections (the general formula to find this is n(n-1)/2, where n is the number of neighbors): the clustering coefficient is therefore 2/6, or 1/3.

If you think about it, a small network motif with many connections within a few nodes will have quite a few nodes with large clustering coefficients. The ones in the "center" of the motif will have the largest clustering coefficients, and the ones on the periphery will have slightly smaller clustering coefficients. So one approach to finding these motifs would be to ask "which node in my graph has the highest CC?" then look at its neighbors and add them to the motif if they also have high CCs (you'd need to set some threshold).

I'm pretty sure there is a tool in Cytoscape which can do this sort of thing for you - dig around and you should find it.

The stuff you are looking at concerning networks of protein interaction have to do with systems biology. Its a growing field and there's lotsa stuff out there. There are an increasing number of applications out there that aiming to allow the user to map biological networks (including protein interactions). A good place to begin would be http://www.systems-biology.org which will give you a great insight into what this field is about. Then if you go to http://www.sbml.com you'll find an explanation of what sbml actually is and a whole list of (all modelling tools, some more useful to you than others as not all are protein oriented) applications available including the before-mentioned Cytoscape.

Right now I just started doing some research into this area (complimenting my final year in an undergrad bioinformatics degree) but more from the computer science perspective of trying to develop such software. i'm glad someone like you came along to ask about this as i would be very interested to find out precisely what you need to help me find out what kind of applications are needed.

As an extension to my research I’m considering performing a further experiment to gain some qualitative data on how different graph layouts facilitate (or hinder) Bioinformaticians when they are attempting to extrapolate data from protein interaction networks. It would be a great help in getting me going with this if someone with some knowledge in the area could give me some examples of the kinds of data that Bioinformaticians try to extract from protein-protein interaction networks. That is to say, the problems that the networks can help to solve.