Next Generation Sequencing (NGS) data is knocking at our door and simultaneously, our ability to design novel enzymes (rational design or directed evolution) using high throughput methods has improved tremendously. As a result, the demand to link enzymatic sequences to their chemical products and metabolic pathways is ever increasing. On the other hand, the push to generate Metabolomics data to design Biomarkers, understand Toxicity, Functional genomics and Nutrigenomics has given researchers a run for their money!

Last year we launched EC-Blast (my old post), a robust tool to compare chemical reactions using chemical knowledge of bond changes, molecule molecule pair (MMP) and molecule substructures. This tool helps plough through and understand the reactions present in the Enzyme Commission (E.C.) classification. This has generated a lot of interest in the research community and industry to revisit and mine the knowledge which might have been overlooked by traditional methods. Feedbacks from our users strongly suggested a demand for tools/methods to systematically link the protein sequences to the knowledge of bond changes, molecule molecule pair (MMP) and molecules substructures.

We have recently developed Sequence to Enzyme (Seq2EC) a novel tool (Figure 1) to:

Metabolism influences building or replacement of tissue, conversion of food to energy, disposal of waste materials, reproduction etc. “Catalysis” is defined as the acceleration of a chemical reaction by a substance which itself undergoes no permanent chemical change. Most biochemical reactions do not take place spontaneously and enzyme catalysis plays an important role in biochemical reactions necessary for all life processes. Without enzymes, these reactions would take place at a rate far too slow for effective metabolism.

Enzymes can be classified by the kind of chemical reaction they catalyze. One such scheme of enzyme classification is defined by IUBMB.

The IUBMB assigns a 4-digit code to each enzyme. Each enzyme is prefixed by EC, followed by the digits.

For example: oxidoreductases EC 1.1.1.1

1. The first digit denotes “Class” of the enzyme

2. The second digit indicates, “Sub-class” of the enzyme

3. The third digit gives “Sub sub-class” of the enzyme

4. The fourth digit in the code is “Serial number” of the enzyme

The classification is as follows:

Group Name

Type of Reaction Catalysed

Example

Oxidoreductases

Oxidation-reduction reactions

Alcohol oxidoreductase (EC 1.1)

Transferases

Transfer of functional groups

Methyltransferase (EC 2.1)

Hydrolases

Hydrolysis reactions

Lipase (EC 3.1)

Lyases

Addition to double bonds or single bonds

Decarboxylases (EC 4.1)

Isomerases

Isomerization reactions

Epimerases and Racemases (EC 5.1)

Ligases

Formation of bonds with ATP cleavage

Enzymes forming carbon-oxygen bonds (EC 6.1)

b) How can I find similar enzymes?

Any similarity search is based on the presence of similar patterns (similar bond changes and/or small molecules) shared between query and target reactions. A large number of shared patterns results in higher similarity score or lesser distance score. In Bioinformatics, the concept of similarity or distance is used to find similar sequences based on amino acid similarity, structural topology, etc. In Chemoinformatics similarity between small molecules/drug molecules (i.e. based on Tanimoto score) is based on the presence of similar bonds and atoms between query and target molecules.

I reckon in the near future we might see such concepts being adapted by IUBMB itself to annotate and classify enzymes.

This would be vital in the study of the interactions between the components of biological systems (metabolites, enzymes and metabolic pathways), and how these interactions give rise to the function and behavior of that system.