Synthetically Accessible Virtual Inventory (SAVI) Database

Complete First Beta File Series - November 2016

283,194,312 SAVI proposed reactions generated in the first complete enumeration of the beta phase of the SAVI project

The SAVI project is an international collaboration of computationally generating a very large database
of reliably and inexpensively synthesizable screening sample structures that have desirable properties for the drug development process.

It utilizes:
(a) a set of transforms with rich chemical context annotation including functional group reactivity data (LHASA, LLC, U.S.; and Lhasa Limited, UK)
(b) a set of highly annotated building blocks (Sigma-Aldrich, Global Strategic Services)
(c) the chemoinformatics toolkit CACTVS with custom development (Xemistry GmbH, Germany)

The transforms are a set of more than 2,300 rules described in the CHMTRN/PATRAN language for encoding chemical transformations with
chemical context and quality criteria added, based ultimately on the pioneering work of E. J. Corey.

These rules, in contrast to simple SMIRKS transforms, allow/provide:
- Computation of whether a reaction, depending on the overall structural features of the target, will work at all.
- Scoring: If the reaction works, how robust it is, taking into account overall structural features.
- Whether protection of interfering groups is required - and these can then already be integrated
in the final starting materials queries to prioritize pre-protected starting materials.
- Proposal of suitable context-dependent reaction conditions.
- Textual warnings in specific circumstances, such as potential of multiple products, borderline conditions, etc.

Ancillary information to the rules is a set of functional group reactivity data, i.e. a table describing
whether any of the standard functional groups in the rule set is unstable under any of the standard conditions.

The building blocks are a set of several hundred thousand compounds available in gram quantities, and with high reliability,
from, or through, Sigma-Aldrich (now MilliporeSigma). This set has been annotated with pricing information and other business intelligence type
data useful for this project.

The chemoinformatics toolkit CACTVS has been expanded in various ways, e.g. with the capability to read the CHMTRN/PATRAN
transforms. An important feature that needed to be implemented was the handling of the reversal of the original LHASA
transform direction, without re-writing rules, for the strictly forward-synthetic SAVI project. Another important capability
was the initial and final starting material (SM) query handling, i.e. the 4 steps: initial SM query extraction from the
2D patterns in the rules; forward reaction from the 2D patterns; scoring (which is the only original LHASA functionality);
final SM query expansion (R-groups, protecting groups, etc.).

For the goal of filtering out structures with less-than-desirable attributes in the drug development context,
several additional computed properties regarded as important in current drug design have been implemented,
such as the demerit scores based on 275 rules for identifying potentially reactive or promiscuous compounds,
published by Bruns and Watson (J. Med. Chem. 2012, 55, 9763-9772); dx.doi.org/10.1021/jm301008n.

For generation of this first beta set of SAVI products, 14 transforms were used, applied to approx. 377,000 building blocks in single-step reactions.
The resulting products have been annotated but not yet filtered with any of the computed or associated molecular properties.
A set of very schematic graphical representations of the transforms implemented so far (two of them were not used for product generation) can be downloaded here

We are ultimately aiming at creating a database of one billion high-quality screening samples that should be easily and
cheaply synthesizable. These novel molecules will all be annotated with a proposed simple and high-yield synthetic route,
and will have been filtered by all the molecular properties generally recognized as important in cutting-edge drug design
that we will have implemented by then. A web GUI is planned that will allow users free access to this database via searches
by various criteria including substructure searches. It will also present links to pages where users can place requests
for having the molecule(s) synthesized by commercial entities.

The following individuals have so far been contributing to this project:

Downloadable Files

The files contain the following information about each SAVI reaction (in the order that they appear in the TAB files or the properties block of the SD files):

Identifiers
NAME - Unique SAVI identifier in the form &lthashcode of product&gt_&lthashcode of reactants&gt_&lttransform id&gt
SMILES - SMILES of the product

Properties referring to starting materials
SAVI_BUILDING_BLOCK_A_SIGMA_STRID - Sigma-Aldrich catalog ID of building block A
SAVI_BUILDING_BLOCK_A_SMILES - SMILES of building block A
SAVI_BUILDING_BLOCK_A_INCHI - InChI of building block A
SAVI_BUILDING_BLOCK_A_INCHIKEY - InChIKey of building block A
SAVI_BUILDING_BLOCK_A_ORDER_LINK - URL of the Sigma-Aldrich cataglog web page for building block A
SAVI_BUILDING_BLOCK_B_SIGMA_STRID - Sigma-Aldrich catalog ID of building block B
SAVI_BUILDING_BLOCK_B_SMILES - SMILES of building block B
SAVI_BUILDING_BLOCK_B_INCHI - InChI of building block B
SAVI_BUILDING_BLOCK_B_INCHIKEY - InChIKey of building block B
SAVI_BUILDING_BLOCK_B_ORDER_LINK - URL of the Sigma-Aldrich cataglog web page for the building block B
SAVI_BUILDING_BLOCK_A_PROTECTION_NEEDED - Indicates whether protection of reagent A is required in this reaction
SAVI_BUILDING_BLOCK_A_PROTECTED - Indicates whether building block A used in this reaction is already a protected version of the required reagent
E_SAVI_BUILDING_BLOCK_B_PROTECTION_NEEDED - Indicates whether protection of reagent B is required in this reaction
SAVI_BUILDING_BLOCK_B_PROTECTED - Indicates whether building block B used in this reaction is already a protected version of the required reagent
E_SAVI_PROPOSED_REACTION - Name of the reaction
SAVI_PROPOSED_REACTION_ID - ID of the reaction (LHASA ID of the transform that describes this reaction)
SAVI_REACTION_HASHCODE - Hashcode of the reaction
SAVI_REACTION_CONDITIONS - Reaction conditions according to LHASA transform rules
SAVI_REACTION_WARNINGS - Reaction warnings according to LHASA transform rules
SAVI_BUILDING_BLOCK_A_COST_GRAM - Cost per gram of building block A
SAVI_BUILDING_BLOCK_B_COST_GRAM - Cost per gram of building block B
SAVI_ESTIMATED_BB_COST_GRAM - Total cost per gram of building block A and B
SAVI_BUILDING_BLOCK_A_COST_MOL - Cost per mole of building block A
SAVI_BUILDING_BLOCK_B_COST_MOL - Cost per mole of building block B
SAVI_ESTIMATED_BB_COST_MOL - Total cost per mole of building block A and B

Properties referring to the reaction
SAVI_LHASA_SCORE - "Quality" score of this reaction according to LHASA transform scoring scheme
SAVI_PREDICTED_YIELD - Qualitative description of estimated yield of this reaction according to LHASA transform (if available)

Notes:
1. All hashcodes used in the file are tautomer-invariant CACTVS cheminformatics toolkit hashcodes
2. Prices reflect Sigma-Aldrich catalog prices at the time of compilation of the building block set and may vary from current prices. Please use the link to Sigma-Aldrich catalog for each of the building blocks to get the most up-to-date price.
3. Please note that the InChI and InChIKey values are not in general Standard InChI[Key] identifiers but contain the FixedH layer (representing a specific tautomer). This will be corrected in subsequent runs.

Older Releases

Alpha 1 File Series - July 2015
In this, very early alpha, stage of this project, and for the file downloadable below, only 11 transforms were used;
applied to approx. 230,000 building blocks; in only one-step reactions; and the ~610,000 resulting products have been annotated
but not yet filtered with any of the computed or associated molecular properties.
To limit the file size, only on the order of one percent of the theoretically possible products (of one-step reactions) have been sampled
.
A set of very schematic graphical representations of the transforms implemented so far (two of them were not used for product generation)
can be downloaded here

610,492 SAVI-generated products in SD format. This is a 374 MB .gz file that umcompresses to 4.4 GB.

The downloadable SD file is a very early alpha version of the set of generated products. The structures in this file
may or may not be part of the final SAVI database. They are meant to be looked at, and commented on, by early users.
Any feedback about individual structures or the entire set, and the data associated with them, is welcome.

If you have any questions regarding potential availability of the generated molecules including access to the
synthetic starting materials, please contact Bret Daniel.

Disclaimer

All structures ("SAVI Products") and associated information downloadable from here are placed in the public domain. They may be freely used
for any purpose without restrictions by any individual or organization. At the same time, we, i.e. the U.S. Government, NIH,
NCI, and their employees and contractors do not make any warranty, express or implied, including the warranties of
merchantability and fitness for a particular purpose with respect to any of the SAVI Products and associated information,
nor assume any legal liability
for the accuracy, completeness, or usefulness of any information disclosed herein and do not represent that use of such
information would not infringe on privately owned rights. See also our general
Disclaimer.