You are here

Predicting the enzyme class of proteins from their InterPro domains

I will introduce my PhD work on simulating a human/automatic curation cycle, and understanding how knowledge quality improves or degrades when automatically derived facts are accepted into a wiki-style knowledge base.

I will also present the evaluation of a rule, that I plan to use for simulation, but also to implement a human/automatic curation cycle in a real Molecular Biology wiki community. The rule assigns as a protein enzyme class the same Enzyme Commission (EC) tags of the InterPro domains found in that protein sequence.

The rule advantages: transparent, usable for any sequenced protein, not relying on other existing annotation, easy for assigning multiple EC numbers and not needing machine learning or high computational power.

The rule achieved a precision of 69% and a sensitivity of 76% over a gold standard of 1,216,515 manually annotated KEGG proteins (622,902 with and 593,613 without enzymatic activity). We also generated "extended" InterPro EC tags, which cover more EC classes (2461 versus 1036 in the offcial tags) and have the additional advantage of being tunable.