12. Reaction Toolkit

12.1 Introduction:

The reaction toolkit provides a set of tools which support both specific
and generic single-step reactions. These tools add the capability to address
numerous reaction-oriented chemical information problems. These tools are
integrated into the Daylight system and are used extensively within Thor and
Merlin to add support for reactions to these systems.

The reaction object is actually implemented within the Smiles toolkit library.
The transform object is implemented within the Smarts toolkit library. Note
that the reaction toolkit is licensed separately, even though the toolkits are
contained within the Smiles and Smarts libraries.

12.2 Polymorphism and the Reaction Toolkit:

The extensive use of polymorphism for both reaction and transform objects is
one of the key principals which makes the reaction toolkit convenient to use.
A design criteria for a reaction object is that it behave as much like a
molecule object as possible. Similarly, a design criteria for the transform
object is that it behave like a pattern object.

In effect, a reaction object is a "superset" of a molecule object.
A reaction can do everything a molecule can, and then some (which we'll cover
in detail).

For example, a reaction contains one or more molecule objects. These are the
components of the reaction (reactant, agent and product molecule). Each of
these molecule objects in turn contains atoms, bonds, and cycles. Now one can
certainly take a stream of molecules over a reaction. This works as one would
expect, returning a stream which contains every component molecule in the
reaction.

dt_stream(reaction, TYP_MOLECULE) => all molecules in the reaction

One can also take streams of atoms, bonds, or cycles over a reaction,
effectively ignoring the molecule layer of the reaction. In this case, the
streams work exactly the same for molecules and reactions.

dt_stream(reaction, TYP_ATOM) => all atoms in the reaction
dt_stream(reaction, TYP_BOND) => all bonds in the reaction
dt_stream(reaction, TYP_CYCLE) => all cycles in the reaction

Note that in the case of streams of atoms or bonds over a reaction, the
resulting stream will contain ALL of the atoms, bonds or cycles in every
molecule in the reaction.

Generally, the strategy for reaction toolkit programming is to ignore the
"molecule layer" of a reaction whenever possible. This results in
toolkit code which is most flexible in that the code will correctly process
both molecules and reactions.

Whether the user enters a reaction or molecule SMILES is completely irrelevant
to the program, the way it is coded, or its execution. This example program
and many others like it (cansmi, showparts, protons, hbonds, smarts_filter,
addfp, etc.) only need be recompiled under version 4.51 or later to be fully
reaction-capable.

The other important factor which makes the reaction toolkit convenient is the
treatment of derivative objects (paths, substruct, pathsets, depictions,
conformations, fingerprints). Each of the derivative object types has been
extended to handle Reaction objects directly. There is no need to use or
understand the behavior of a bunch of new derivative objects specifically for
reactions.

In the case of derivative objects, the molecule layer of a reaction is
ignored; the derivative objects just work at the atom and bond layer. For
example, the depiction object used in the example code above handles reactions
just as well as molecules. One can create a depiction for either a molecule
or a reaction object. The returned depiction objects behave exactly as in
version 4.42 with one exception: the base object (dt_base(3)) of a
depiction may now be either a reaction or molecule; in version 4.4 the base
of a depiction was always a molecule. See section 12.7 for further discussion
of derivative objects and reactions.

12.3 Processing reactions:

A reaction consists of a set of molecule objects, each has a specific role in
the reaction: reactant, product, or agent. Agents are molecules which do not
contribute atoms to the products, or accept atoms from the reactants. Note
that this definition is not enforced by the toolkit. It is manifested in the
definition of atom maps for reactions.

This section focuses on tookit functions which are specific to reaction
objects or functions which have new, unique behaviors for reaction objects.
These functions are generally useful for building reactions from scratch and
for manipulating reaction objects.

Adds a molecule object to a reaction. The role (DX_ROLE_REACTANT,
DX_ROLE_AGENT, DX_ROLE_PRODUCT) indicates the role which the molecule
will take in the reaction. A copy of the molecule is added to the
reaction. The original molecule is unchanged. The reaction must be in
modify-on state. Returns the molecule object within the reaction to
which the given molecule was added.

Practically speaking, a reaction object will have at most one each of
reactant, agent, and product molecules and these are generally processed
(eg. streams of molecules over a reaction) in reactant-agent-product
order. If one adds multiple molecule objects to a reaction with the same
role, these are combined within the reaction object. The way to think
about this is that molecules are used as the internal representation of
structural data in a reaction, yet the reaction object reserves the
right to change it's internal representation as necessary. Since the
original molecules are unaffected, this works out well.

Returns the role which the object plays within a reaction. 'ob' can be
an atom, bond, cycle, or molecule. Returns (-1) if 'ob' is not part of
the given reaction. The role returned will be one of the contstants:
DX_ROLE_REACTANT, DX_ROLE_AGENT, or DX_ROLE_PRODUCT. It is not possible
to change the role of an object within a reaction. The role is set
during creation of the reaction (via dt_smilin(3) or dt_addcomponent(3))
and is immutable.

There are quite a few functions which take on new capabilities when processing
reactions:

When given a reaction SMILES string, interprets the SMILES and returns a
newly-allocated reaction object. Note that dt_smilin(3) returns the
appropriate object (either molecule or reaction) for the given SMILES
string. This behavior also depends on the licenses available:

Returns the canonical SMILES for a reaction. When 'iso' is FALSE,
returns the unique SMILES. The unique SMILES is the canonical SMILES
where all agents, isomeric and isotopic information, and atom maps are
ignored for generation of the SMILES.

When 'iso' is TRUE, returns the absolute SMILES for the reaction. This
includes all agents, isotopic and isomeric information, and atom maps.

Puts a reaction object and all of its component molecules in
modify-on state. A reaction must be in modify-on state to add
components, or modify any of the component molecules. Note that one
can indirectly put a reaction in modify-on state by calling
dt_mod_on(3) for one if its component molecules.

Puts a reaction object and all of its component molecules in
modify-off state. Causes every molecule to be checked for
structural validiity. This function fails if any of the component
molecules is invalid. If the function fails, the entire reaction is
deallocated.

Deallocates a reaction and all of its component molecules, atoms, bonds
and cycles.

The following code gives a simple example of creation and manipulation of a
reaction object. In this example, a reaction is built two different ways:
first, a reaction is created from scratch, and molecule objects are added to
build up the reaction. Second, a reaction is built from a single
reaction-SMILES. The resulting reactions have the same unique SMILES.

12.4 Reaction Molecules:

Reactions are made up of molecule objects. These are normal molecules,
with a new property, role, which is used to distinguish the reactant,
product and agent in a reaction. Molecules within reactions have the reaction
as a parent, and have a value defined for their role property, but are
otherwise indistinguishable from any other molecules in the toolkit.

Prior to version 4.5, a molecule never had a parent object.
In version 4.5 and later, if a molecule is part of a reaction object,
it's parent will be that reaction, otherwise this function will return
(NULL_OB).

For a molecule which is part of a reaction, puts both the molecule
itself and its parent reaction in modify-on state.

This is identical to callind dt_mod_off(3) for the parent reaction. In
effect, the toolkit treats a reaction and its component molecules as a
single unit for structural modification; setting the state for either
the reaction or one of its child molecules sets the state for all of
them.

In general, if one is modifying molecules which are part of a reaction, it is
best to perform dt_mod_on() and dt_mod_off() on the reaction object itself,
rather than the component molecule(s). One can easily get confused if one
attempts to set mod-on and mod-off for the component molecules in a reaction.

12.5 Atom Maps:

Within the SMILES language for reactions, atom maps are numeric atom labels.
All atoms within a SMILES string with the same atom map label are associated
in an atom map set.

Within the toolkit, atom maps are manipulable only as atom map sets.
The toolkit takes care of interpreting the labels on input SMILES and
labeling the output SMILES in a systematic way.

Agent atoms and atoms which are not part of a reaction may never be put in an
atom map class. Only reactant and product atoms from the same reaction may
appear in a given atom map class.

There are no requirements for completeness or uniqueness of the atom mappings
over a reaction. Atom mappings are independent of the connectivity and
properties of the underlying molecules. The rules for an atom maps are
as follows:

Only reactant and product atoms may belong to atom map classes. Atoms
which are not part of a reaction cannot belong in atom map classes.

An atom may be unmapped or may only belong to one atom map class at a
time.

Atom map classes must contain at least least one reactant and one product
atom from the reaction.

If either the last reactant or last product atom is removed from an atom
map class, the atom map class is removed.

Sets the two atoms to be in the same atom map class. 'atom1' and 'atom2'
must be atoms from the reactant and product of the same reaction, in
either order.

If either 'atom1' or 'atom2' already belongs to a map class, the result
of this operation is to merge the sets of atoms into a single map class
which contains 'atom1', 'atom2', and any atoms which were previously
mapped to 'atom1' or 'atom2'. For example, the following four
functions, applied in any order, result in a single map class which
contains atoms: r1, r2, r3, p1, p2.

If 'atom2' is NULL_OB, 'atom1' is unmapped from its current map set.
That is, 'atom1' will no longer be mapped to any other
atoms in the reaction. The atom map set from which 'atom1' is removed
remains intact unless the atom map set becomes invalid. A map class
becomes invalid if it no longer contains at least one reactant and one
product atom. If the atom map set becomes invalid, all of the remaining
atoms are unmapped from one-another.

Tests the two atoms. If the two atoms are in the same map class,
returns TRUE. Otherwise, returns FALSE. This is a convenience
function. It is somewhat more efficient than performing the same
operation by getting the substruct for one atom and testing the
other against the substruct.

12.6 Hydrogens in Reactions:

Hydrogens in reactions are handled as with molecules (suppressed unless the
hydrogen is special). With reactions, there is an additional case which will
make a hydrogen special. It is often desireable (eg. 1,5-hydride shift) to
store information about the location of hydrogens as part of the atom map of a
reaction. Hydrogens with a supplied atom map are considered
"special" and these hydrogens are not suppressed in the toolkit.
These mapped hydrogens appear explicitly in Isomeric SMILES for reactions.
Otherwise, atom-mapped hydrogens do not appear in canonical SMILES.

Note that the special hydrogen
dt_isohydro(3) can not be part of
any atom map class. Hence, this special hydrogen can never be used
in place of an atom-mapped hydrogen in a reaction. Any atom-mapped hydrogens
must be stored as explicit hydrogens.

12.7 Reaction Queries:

A reaction query is expressed with the SMARTS language. SMARTS has been
extended with reaction and atom map query syntax. There is no separate
pattern object for a reaction query. When a SMARTS is interpreted, a pattern
object is returned. In effect, the pattern object takes on the additional
expressive capabilities for reactions.

Returns an optimized SMARTS string. Works correctly for both molecule-
or reaction-SMARTS. If "vmatch" is TRUE and the given SMARTS
string is for a reaction query, dt_smarts_opt fails. Vector matching on
reaction queries is not allowed.

12.8 Reactions and other objects:

The flexibility and utility of the Daylight toolkit arises partly because of
the ability to create derivative objects based on Molecules. These objects
include paths, substructs, pathsets, depictions, conformations and
fingerprints. Each of these objects has a specific unique purpose within the
toolkit, however they all share some common features which are important for
reaction processing:

They all have a molecule as their base object,

they all store data about the atoms and bonds in a molecule,

and they all ignore other attributes of the molecule not directly
related to the atoms and bonds in the molecule.

These features allowed us to directly extend these objects to handle reactions.
As discussed in Section 12.3, the "molecule layer" of a reaction is
ignored; only the atoms and bonds of a reaction are considered.

Hence, each of these objects is now defined as having either a molecule or a
reaction as its "base" object. Otherwise, their behaviors are
essentially unchanged. They still store data about the atoms and bonds in
their base object, and they still ignore other non-relevant attributes of
their base object (like the molecules).

Briefly, we address each of the main derivative types in the next sections and
highlight their behaviors with regard to reactions.

12.8.1 Paths and Substructs:

Paths and substructs are collections of atoms and bonds, which all come from
the same base object. With reactions, this behavior remains unchanged. The
atoms and bonds within a path or substructure must come from the same reaction
but they may be from different molecules within a reaction. For example, the
following code creates a path from a reaction object, adds all of the
double-bonds from the reaction to the path, and returns the path.

Note that absolutely no consideration is given to the fact that the bonds may
be in different molecules within the reaction. As long as the atoms and bonds
added to a path or substruct are all part of the correct base object (the
object given in dt_alloc_path(3)) this succeeds.

12.8.2 Pathsets:

A pathset is a collection of paths over the same base object. The base object
may be a reaction. A pathset is returned from the SMARTS matching functions.

In this case, the pathset returned depends on the type of target used for the
match function:

This returns a pathset with "target" as its base object.
"Target" may be either a reaction or molecule. The pathset
will contain one or more paths. The base object (dt_base(3)) of the
pathset and all paths withing the patheset will be the target object.
target object. Note that this behavior holds regardless of the type
of pattern used in the query (reaction or molecule query).

This returns a pathset with "target" as its base object.
"Target" may be either a reaction or molecule. The pathset
will contain one or more paths whose base object will be the same
target object.

There is one important exception for vector-matching: It is only legal
to use a molecule pattern for dt_vmatch(3). One may match the
molecule pattern against either a reaction or molecule target, but it
is not possible to use a reaction pattern for vector matching on any
target (reaction or molecule).

12.8.3 Depictions:

The main distinction between a reaction depiction and a molecule depiction is
the presence of a reaction arrow, and the potential desire to lay out the
various reaction parts (reactant, agent, product) in different regions. These
two functions are handled with dt_depict(3), and dt_calcxy(3); all other
depiction-related functions remain unchanged.

Sets the coordinates for the atoms of the given depiction. In the case
of a reaction depiction, it lays out the reactants, agents and products
in a left-to-right orientation, with the reactants and products centered
vertically and the agents shifted above the center.

If atom map classes are available for the atoms in the depiction, the
toolkit will attempt to orient the reactant and product sides of the
depictions the same way. The toolkit attempts to minimize the RMS
distance between mapped atom pairs by reorienting the product part of
the reaction depiction before laying out the parts of the reaction.
This orientation first applies to ring atoms within the depiction. If
no mapped ring atoms are found, non-ring atoms are used.

Generates the depiction, using the Daylight drawing library. For a
reaction object, automatically includes a scaled arrow in the drawing.
The toolkit provides no access to the arrow itself, it is drawn by the
toolkit using the framega set for the depiction object.

The arrow is positioned as follows: a horizontal vector is laid out
between the midpoints of the reactant and product parts of the
depiction. The vector is clipped so that it doesn't overlap any
parts of the reaction. Finally, the clipped vector with an arrowhead
is drawn. If it is not possible to clip the vector so it doesn't
overlay any part of the reaction, the toolkit will then draw a short
arrow between the midpoints of the reactants and products, ignoring any
overlap.

12.8.4 Conformations:

The conformation object allows the storage of (x, y, z) coordinate data for
the atoms in a molecule and reaction. A conformation object makes no
distinction between the roles of atoms in the reaction object. With the
exception of allowing a conformation to be created from a reaction, all
conformation-oriented functions remain unchanged.

12.8.5 Fingerprints:

The fingerprint object does behave differently for a reaction object versus a
molecule object. The differences are seen when creating a fingerprint object,
all other fingerprint toolkit functions remain unchanged. In addition, there
is a new fingerprint-creation function, dt_fp_differencefp(3), which is
designed primarily for reaction processing.

Generates a fingerprint object from the given molecule, reaction,
substruct, or path. For reaction objects or reaction-derived paths and
substructs, the resulting fingerprint object is equivalent to the
bitwise-OR of the following fingerprints:

the fingerprint of the reactant part,

the fingerprint of the product part,

a bit-shifted fingerprint of the product part.

This behavior allows the fingerprint to serve as a structural screen for
all superstructure-matching and allows the fingerprint to provide some
discrimination power between reactant and product parts.

For reactions, the fingerprints tend to be quite dense, and are somewhat
less efficient a structural screens that for molecules. The main
advantage of this scheme is the full compatability of these reaction
fingerprints with molecule fingerprints in the Daylight system. Note
also that this fingerprint scheme doesn't provide the most appropriate
measure of similarity for reactions.

Generates a difference fingerprint object from the given molecule,
reaction, path, or substruct object. This function is oriented towards
reaction processing, so isn't very useful for molecules and
molecule-derived paths or substructs.

For a molecule or molecule-derived object, returns the normal
fingerprint, (identical to dt_fp_generatefp(3)).

For a reaction or reaction-derived object, returns the difference in
fingerprint between the reactant and product parts of the object as
follows:

Generates the count of each path in the reactant part.

Generates the count of each path in the product part.

For any paths whose count changes from reactant to product part,
sets a bit in the final fingerprint.

The net result of these operations is a fingerprint of the connectivity
change for a reaction. This is an extremely useful way to analyze and
cluster reactions.

There is one important caveat for difference fingerprints: to work
optimally, the reaction must have unit stoichiometry. If not, missing
atoms on either side of the reaction will result in extraneous bits
being set in the difference fingerprint.

12.9 Transforms

Transforms are very similar in behavior to patterns. Essentially the
transform language is a subset of SMARTS, with some additional specific
requirements. These requirements are validated on input of the
transform. This also means that any valid SMIRKS is also a valid
SMARTS. This also means that a SMIRKS can be optimized by dt_smarts_opt(3).
A more extensive discussion of the relationship of SMILES, SMARTS, and SMIRKS
can be found in the Daylight Theory
Manual.

Returns an optimized SMIRKS string. Remember, SMIRKS are a subset of
SMARTS. "vmatch" must be FALSE for transform SMIRKS.
Optimizing a SMIRKS is useful because the first step in application of a
transform object is a SMARTS-match on either the reactant or product
side of the transform. Hence, the optimizations performed by
dt_smarts_opt(3) are also relevant to transforms.

Returns a molecule pattern object from the "role" part of the
transform.

Transforms can be applied to molecule objects. The result of these
operations is the creation of new reaction objects which contain both the
starting molecules and a set of newly-created molecules. Transforms are
bidirectional, they can be applied in either the forward or reverse
directions. In effect, transforms represent generic reactions. Specific
instances of these generic reactions can be created from the combination of a
transform and a set of molecules, which act as reactants or products in
the specific reaction.

Applies the given transform to the molecule or sequence of molecules
"som". Note that the molecule or sequence are not altered by
the function. The result is a sequence of newly-allocated reaction
objects, which represent specific instances of the reaction. The
parameter "limit" controls whether only the first reaction
found is returned or all of the possible answers are returned. The
"limit" parameter has the same semantics as in
dt_match(3).

The "direction" may be one of DX_FORWARD or DX_REVERSE. When
direction is DX_FORWARD, the given molecules are treated as reactants
and the transform is applied in the forward direction to the molecules.
When "direction" is DX_REVERSE, the given molecules are
treated as products and the transform is applied in the reverse
direction.

The application of a transform logically occurs in two steps. In
the forward direction, the reactant side of the transform is
matched, as SMARTS, against the set of molecules given. Each place
where the SMARTS matches is marked. In the second step, the atom and
bond changes in the transform are applied to the matched molecules.

The only difference between dt_transform(3) and dt_utransform(3) is the
function which is used to match the SMARTS expression (dt_match(3) and
dt_umatch(3) respectively). The net result is that with
dt_utransform(3), the resulting answers are generated from the unique
set of matches, while with dt_transform(3), the complete set of answers
results.

Similarly, dt_xtransform(3) uses dt_xmatch(3) for the initial SMARTS
match. The net result is that dt_xtransform(3) always returns exactly
one new reaction. This new reaction may have more than one application
of the transform within it.

A transform (at least in one direction) can be thought of as a
SMARTS expression plus a set of atom and bond changes.

The resulting sequence of reaction objects are owned by the user. Both
the sequence and the reactions must be deallocated by the calling
program when done with them. The given molecules or sequence of
molecules are not modified by the function.

The transform processing functions set atomic properties for the newly-created
reaction atoms. These properties are set in order to allow the user
to correlate the SMIRKS with the resulting reaction. For example, given
the amide formation SMIRKS:

[C:1](=[O:2])Cl.[H][N:4][C:5]>>[C:1](=[O:2])[N:4][C:5]

and the reacting molecules:

CC(=O)Cl.NCCC

The result of this transformation will be a reaction, with the following
atomic properties set:

The "tmap" property is the map class for the transform atom which matched
this node in the reaction. For example, the amine Nitrogens are map class "4"
in the transform, hence the tmap property for the Nitrogens in the resulting
reaction are set to "4".

The "torder" property is the cardinal ordering of the reaction atoms, based on
the match order of the transform. Were one to reorder the reaction atoms
based on this numbering, the order would correspond to the ordering of the
expressions in the SMIRKS. In the example, the original SMIRKS has 10 atomic
expressions total, and the "torder" properties go from 1 - 10. The value of
4 is missing because the hydrogen is suppressed in the unique SMILES.