2012-03-08

Convergent synthesis, finally!

The legacy C++ version of my chemistry product has finally gained the capability to do convergent synthesis — in a primitive form for now.

What is convergent synthesis?

Consider a complex molecule that we wish to synthesise. In the usual method, the synthesis steps proceed linearly. Suppose that the following is the sequence of synthesis steps, where Goal designates the molecule to synthesise.

A → B → C → D → E → F → G → H → I → J → Goal

The above sequence has 10 steps in the process. We start with a simple, available molecule A. Presumably, a functional group is either added, replaced or deleted at each step. [Reality includes several more dimensions, but let us keep the discussion simple.] While easy to comprehend, the biggest problem with this is the effective yield of the route. For the purposes of discussion, let us assume an average yield of 85% per step. With 10 steps, the effective yield is less than 20%!

Contrast this with the following scheme.

P → Q → R ⌉
| → Goal
S → T → U ⌋

In this case, we have two independent paths leading to moderately complex molecules R and U. Then, these two paths converge to give rise to a more complex molecule, which in this case is our goal molecule. The expectation is that since R and U are only moderately complex, they can be independently synthesised in a couple of steps each. The effective yield for the convergent route, then, is about 44%! This is, obviously, much more attractive.

How does it work in retro-synthesis?

My product actually does retro-synthesis, i.e., it starts with the goal molecule, and constructs the steps in reverse order. At each step, the product molecule is broken into reactants. In most scenarios, the coreactant is a trivial molecule; or, it is directly available for purchase from a company like Sigma Aldrich.

If we wish to take advantage of convergent synthesis, on the other hand, how we break a product molecule into possible sets of reactants becomes a matter of extreme significance. In the step R + U → Goal, effective convergence is possible if and only if R and U have about the same degree of complexity.

What next?

The immediate challenge is to locate such reactions, and build a repertoire of them. Another, of course, is to be able to resolve a product molecule to utilise one such! Our chances depend on being able to identify a reasonably central atom that is suitable for initiating the cleavage of the product molecule. The algorithms have to be refined to exploit this new capability, as well!