In the midst of preparing a talk for next Monday. It occurred to me that perhaps we don’t see more protein structure-based prediction in bioinformatics because – there aren’t enough structures.

pdbstats

Sure, the PDB has grown a lot in the past 5 years or so and 53 103 structures (as of now) looks impressive. However, if you’re interested in protein-protein interaction, you want at least 2 chains: which more or less halves the dataset. If you want two different protein chains, you lose almost another 75%. Let’s specify a reasonable minimum resolution for X-ray diffraction data and there go ~ 3 000 entries. We probably don’t want multiple, similar proteins so let’s remove sequence identity at a redundancy of 90%. We’re left with about 2% of the original PDB, which might be useable for looking at interactions.

No wonder that most bioinformatics focuses on sequences and high-throughput interaction data.

People are finding many outlets for their work. Pierre maintains a repository of tools where you can find IBDStatus, his latest software for genetic analysis.

Spotted in Nature this week:

Makes perfect sense doesn’t it: if you publish an article on a structure, include a link to the PDB resource. Yet so far as I can tell this is a new feature, since it jumped out at me. Given that the WWW is such a rich publishing platform, simply because of hyperlinks that connect data, how long before paper copies of all journals are considered quaint and obsolete?