Rapid diversity analysis in combinatorial libraries using Markush structure techniquesJohn M. Bamard, Geoff M. Downs, David B. Turner, Simon M. Tyrrell, Peter Willett, Barnard Chemical Information Ltd., 46 Uppergate Road, Stannington, Sheffield S6 6BX UK, and Department of Information Studies, University of Sheffield, Sheffield S 10 2TN, United Kingdom
A number of techniques and algorithms have been developed to handle Markush structures from chemical patents without enumerating the (often extremely large) sets of compounds described. This paper describes the application of such techniques to diversity analysis in combinatorial libraries. We present a data structure which shows both the chemical nature of the monomer units in a combinatorial library, and the logical relationships between them. We describe the use of algorithms to generate structural fingerprints for the compounds in the library directly from this data structure, and we also discuss its use for rapid calculation of numerical measures of library diversity.

A comparison of dissimilarity methodologies in constructing representative compound libraries. Michael. S. Lajiness, Computer-Aided Drug Discovery, Pharmacia & Upjohn, Inc., Kalamazoo, Michigan 49007-4940.
There has been quite a bit of interest in the topic of structural diversity and how it relates to pharmaceutical lead finding and development. Several different approaches have been proposed and utilized for the selection of structurally diverse subsets of compounds. An important question that is often, however, not addressed is how these approaches compare? Are they equally effective in terms of distinguishing a steroid from a prostaglandin? Is one method better at locating biologically active compounds? This paper will compare several different methods for selecting structurally diverse subsets of compounds. The effectiveness of these methods to locate active compounds will be accessed using results from several different biological assays conducted at Pharmacia & Upjohn. The methods currently under study are from Tripos, Arris Pharmaceuticals, and Pharmacia & Upjohn.

The well tailored library: Beyond mere diversityEric J. Martin, Roger E. Critchlow, Chiron Corp., Emeryville, CA 94608
Combinatorial library design attempts to choose the best substituent set for a combinatorial synthetic scheme to maximize the chances of finding useful compounds such as drug leads. Initial efforts focused primarily on maximizing diversity, perhaps allowing bias through the inclusion of a small, fixed, set of pharmacophoric groups. However, many factors besides diversity impact good library design. A library can be better "tailored" by creating categories such as polar, pharmacophoric, rigid, low molecular weight, inexpensive, etc. The most diverse designs matching desired profiles of these characteristics are generated. Comparing the diversity scores among design profiles reveals tradeoffs between high diversity and physical property distributions, synthetic difficulty, expense, pharmacophoric bias, etc. Tailored library design requires close interactive effort between computational and medicinal chemists, so specialized programs were developed to integrate substructure searching, display, and statistics to facilitate the design of well tailored libraries.

The dimensions of chemical similarity space.Robin W. Spencer, Pfizer Central Research, Groton, CT, 06333.
The Tanimoto distance function is an efficient measure for database searching and diversity analysis, and may be taken to define a "chemistry space." Because this space is based on a comparison of hundreds of molecular fragments, it has been presumed to have high information content and high dimensionality. Yet a generalized measure of dimension shows that it is mostly less than 10 dimensional. Analysis of the space surrounding single compounds as well as a simulation of an ideal space shows how chemistry space depends on the presence of true analogs, the size of the fragment library, and the size of the compound collection

Validating metrics and methods for selecting diverse chemical subsets.Robert D. Clark, Richard D. Cramer, Jon T. Swanson; Tripos, Inc., 1699 South Hanley Road, St. Louis, Missouri 63144
Combinatorial synthesis for lead discovery has sharply increased the need to systematically select diverse representative subsets from virtual databases of compounds, because making all possible combinations of all possible reagents is usually neither practical nor desirable. The task is complicated by the fact that relevant diversity is in terms of physiological response, but only descriptors derived from structural information are generally available. Descriptors which perform well in the context of QSAR analysis or similarity searching are not necessarily well-behaved for selecting diverse subsets. Moreover, how well a particular descriptor performs can depend upon the (dis)similarity measure (e.g., Euclidean vs. Tanimoto) used as well as on the selection algorithm employed. General procedures for quantitatively assessing the usefulness of methods for finding diverse subsets will be discussed, along with results for some particular combinations - some common and some novel - of descriptors, measures, and selection algorithms.

A rigorous evaluation of the neighborhood properties of diversity metrics.J. B. Kinney, C. J. Eyermann, DuPont and DuPont/Merck Pharmaceuticals, Stine-Haskell Research Center, Newark, Delaware 19714-0030.
Assessing the validity of a diversity metric is an important step in the design and analysis of combinatorial libraries. One of the important properties of a good diversity metric is the ability to predict the properties of a compound based on it's neighbors' properties. This paper will present a detailed statistical analysis of the quantitative performance of a variety of common metrics using data from several scouting and lead optimization programs. The discussion will focus on practical aspects of the neighborhood properties using a variety of statistical and graphical performance assessments.

Modal fingerprints and topological diversity.C. J. Blankley, Department of Chemistry, Parke-Davis Pharmaceutical Research Division, Warner-Lambert Company, 2800 Plymouth Road, Ann Arbor, MI 48105.
A new method has recently become available (Stigmata; Shemetulskis, et al., J. Chem. Inf. Comput Sci. (1996),36,862-871). for extracting the common element, termed the modal fingerprint, from a set of molecular fingerprints. Molecular fingerprints based on a topological description of the molecule capture atom and bond path information in binary form which is readily amenable to comparison. The modal fingerprints for a set of molecules, extracted at maximum, median or minimum strigencies, offers a profile of the degree of topological similarity (or dissimilarity) within a given collection of compounds. Approaches to using this tool to measure chemical diversity within and between chemical datasets and relating the derived metrics to qualitative chemical notions of diversity will be illustrated by considering data collections of various origins typical of those encountered by medicinal chemists. Some comparisons with other proposed diversity measures will also be offered.

Designing combinatorial libraries using automated docking methodsM. G. Bures, Abbott Laboratories, D47E AP 10-2, 100 Abbott Park Road, Abbott Park IL 60064-3500.
Increasing emphasis is being placed on using structural information, along with diversity analysis, to help design focused combinatorial libraries. Using an experimental or modeled structure of a representative library member or analog, we use docking programs such as DOCK and LUDI to orient and score a set of possible substituents in their proposed binding region. The score, or estimated binding energy, of the substituent is used as an indication of the predicted potency of the resulting library compound. In addition, when appropriate, we use CoMFA to generate a 3D-QSAR model for the compounds actually synthesized and tested and use the model to forecast the potency of new sets of substituents. The mechanics of this approach and results of validation studies will be discussed.

2D versus 3D similarity: Use of molecular shape-based 3D searching techniques for identifying novel compounds.Osman F. Güner, Matthew Hahn, Hong Li, and Moises Hassan, Molecular Simulations, Inc., San Diego, CA 92121.
Steric shape plays a crucial role in receptor-ligand binding and a new drug candidate must first fit inside a receptor active site before it has a chance to binding to it and exerting a biological effect. Therefore, shape-based 3D searching techniques complement well the traditional pharmacophore based 3D searching techniques. Since shape-based 3D search retrieves compounds that are "similar"" in shape, a comparison with an established 2D similarity method reveals interesting differences. The comparative study was accomplished by performing 2D similarity and 3D shape searches on the same database using the same query molecule. The most similar 20 hits from both searches are compared and analyzed. While the 2D similarity method retrieved compounds with similar topology but different size, the 3D shape search retrieved compounds similar in shape but with diverse topology. The advantages and limitations of each method are presented.

Designing pharmacophorically diverse libraries. D. Pickett, Rhône-Poulenc Rorer S.A., Centre de Recherche de Vitry- Alfortville, 13 Quai Jules Guesde, BPI4 94403 Vitry-sur-Seine, France.
In recent years, the pharmaceutical industry has become increasingly concerned with methods for diversity analysis, driven by the needs of combinatorial chemistry. The results will depend critically on the measure of diversity selected. Methods have been developed which utilize the pharmacophores presented by a molecule as a descriptor (S.D. Pickett, J.S. Mason, I.M. McLay, J. Chem. Inf. Comput. Sci. in press). As the pharmacophoric properties of an individual molecule within a library will depend upon the interaction between different R-groups making up the molecule, reagent selection should be performed as far as possible on the properties of the final products. The difficulty lies in the combinatorial nature of the problem - selection of one reagent immediately specifies a number of products. Strategies have been developed to aid in reagent selection which address this problem. The interdependency of the groups means that it is not possible to select one set of reagents suitable for all situations. Rather, the selection process should be repeated for each library of interest.

Diversity selection of reagents for combinatorial chemistry by a 3D docking approach.H. Briem, Boehringer Ingelheim KG, Med. Chem. Dept., D-55216 Ingelheim, Germany.
Selection of reagents is a crucial step in the generation of a compound library by combinatorial chemistry. Ideally, the selection should retain as much molecular diversity in 3D space as possible. In this paper a new diversity metric for chemical reagents will be described. The approach includes docking of an assembly of aligned compounds into different receptor binding pockets. The common substructures of the assemblies are held fixed at different grid points within the pocket. Each reagent at each position is scored by interaction energy with the protein. After data reduction by principal components analysis, different selection and clustering algorithms may be applied in order to generate a diverse subset of reagents. The paper will describe the procedure and some examples will be given.

Asymmetric similarity and molecular diversity.G.M. Maggiora, J. Mestres*, T.R. Hagadone, and M.S. Lajiness, Computer-Aided Drug Discovery, Pharmacia & Upjohn, 301 Henrietta Street, Kalamazoo, MI 49007.
[*Permanent address: Institute of Computational Chemistry, University of Girona, 17071 Girona, Catalonia, Spain].
A measure of similarity of the molecules within a given set is essential to any method for evaluating the molecular diversity of the set. Most similarity measures in use today are symmetric, that is X is as similar to Y as Y is to X. A new class of asymmetric similarity measures will be presented, and how these measures provide molecular information not provided by symmetric measures will be described. An assessment of molecular diversity based upon a Shannon-like entropy function will also be presented, along with a comparative analysis of the performance of symmetric and asymmetric similarity measures based upon the Shannon diversity function.

Fast ligand docking into receptor cavities.Akbar Nayeem, Tad Hurst, Joe Leonard, Tripos, Inc., 1699 So. Hanley Rd., St. Louis, MO 63144.
When the 3D structure of an important biological receptor is known, researchers would like to use that information to find novel potential drug candidates. This has fueled a high level of interest in computation methods of ligand-receptor docking. At the same time, combinatorial chemistry has expanded the number of structures which can be produced by medicinal research groups by orders of magnitude. Thus, there is a desire to screen the extremely large libraries of compounds which could be made for structures which are maximally likely to bind to the receptor using computation techniques. This process is called Virtual High Throughput Screening (VHTS), and requires ligand-receptor docking tools which are extremely fast. In this presentation we will discuss the efforts to product a high-quality flexible docking system which is appropriate for screening databases of one million structures or more.

Reduced dimensional representations of molecules and molecular similarity.W. Graham Richards, Daniel D. Robinson Physical and Theoretical Chemistry Laboratory, Oxford University, South Parks Road, Oxford OXI 3QZ, United Kingdom.
Experienced molecular modelers are accustomed to displaying molecular structures from databases as three-dimensional representations on graphics terminals. These displays are however frequently not as easily recognized by non-specialists who think of their chemistry in terms of molecular structures drawn on a flat page. Using the technique of non-linear mapping we have developed a way of displaying three-dimensional structures in two dimensions whilst retaining the information contained in the three-dimensional distance matrix. At the same time the figures are recognizable in classical terms. Once derived, the two-dimensional representation has major advantages in searching for structural similarities. For two-dimensional diagrams we can take advantage of methods developed for pattern recognition. This holds out the promise of calculating mutual similarities between all pairs of molecules in large data sets derived from high throughput synthesis or combinatorial chemistry and hence quantifying diversity.

9:35

15

Simulated Annealing Guided Evaluation (SAGE) of diversity: A novel computational tool for diverse chemical library design and database mining.Alexander Tropsha1, Weifan Zheng1, Sung Jin Cho1, Ceris L. Waller2
[1Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, Chapel I-Hill, NC 27599-7360, 2Oncogene Science, Inc., Uniondale, N.Y. 11553.]
We have developed a program for Simulated Annealing Guided Evaluation (SAGE) of molecular diversity. SAGE has been implemented for both the rational design of diverse chemical libraries and database mining. Several large simulated data sets were generated and used to evaluate the effectiveness of the method. Two different diversity functions were designed and compared in terms of maximizing the diversity while maintaining the uniformity of distribution of selected objects in the descriptor space. The best diversity function was analogous to the Coulomb law. Kohonen self-organizing map was used for both preprocessing the datasets and visualizing the results. We propose SAGE as a general tool for diversity analysis and database mining in the context of new drug discovery.

10:00

16

Stochastic algorithms for exploring molecular diversity.D.K. Agrafiotis, E.P. Jaeger, 3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Suite 104, Exton, Pennsylvania 19341.
A common problem in the emerging field of combinatorial drug design is the selection of an appropriate subset of compounds for chemical synthesis and biological evaluation. In this paper, we introduce a new family of algorithms that combine a stochastic search engine with a user-defined objective function that encodes any desirable selection criterion. The method is applied to the problem of maximizing molecular diversity using a novel diversity metric, and the results are visualized using self-organizing neural networks and Sammon's nonlinear mapping algorithm. Because the search method and the performance metric are treated as independent entities, the method can be easily extended to perform multi-objective selections in advanced experimental design systems.

10:30

17

The use of subtemplates and supertemplates in drug discovery.Charles J. Eyermann, John M. Geremia, The DuPont Merck Pharmaceutical Company, Chemical and Physical Sciences, Experimental Station, Wilmington, Delaware 19880-0500
Significant research on which metrics are useful in analyzing molecular diversity has been recently reported. The use of 2D fingerprints based on chemical fragments or atom types and bond paths have emerged as a metric which is computationally fast as well as having reasonable neighborhood properties. Clustering based on these 2D fingerprints has therefore been used to help select reagents for combinatorial chemistry as well as compounds to acquire from external sources. While clusters based on 2D fingerprints are useful for grouping compounds in a database they do have some limitations. Here we present an alternative method for analyzing a molecular graph based on ring templates as well as user-defined templates. Results and examples of how these templates can be used in molecular diversity analysis and as a synthetic feasibility filter will be presented.

11:00

18

Diversity measures and their integration with company databases.Colin Edge, Stephen H. Calvert, Darren G. Jones, SmithKline Beecham Pharmaceuticals, New Frontiers Science Park, Third Avenue, Harlow, Essex, CM19 5AW, United Kingdom.
Various measures of chemical diversity have been used in the design of combinatorial arrays. Clustering methods of the theoretically derived properties of molecules are discussed. These clusters have been integrated with corporate databases, using an ISIS/BASE system, allowing the analysis of physicochemical and mass-encoded diversity, the design of new chemical arrays, the ordering of reagents and the automatic registration of the products in the corporate database.

11:30

19

HQSAR - A highly predictive QSAR technique based on molecular holograms.Tad Hurst, Trevor Heritage, Tripos, Inc., 1699 So. Hanley Rd., St. Louis, MO 63144.
COMFA has proven to be an extremely valuable predictive tool for computational chemists in the medicinal chemistry field. It is most valuable for small sets of similar structures (10-50). The necessity of aligning the structures prior to the use of CONTA and its large memory requirements makes it difficult to use this technique for the larger datasets now being produced by combinatorial chemistry and high-throughput screening. Hologram QSAR (HQSAR) is a new technique which uses specialized fragment "fingerprints" called Molecular Holograms as predictive variables for predicting biological activities. This presentation will detail the results which in many cases are as good as CONEA or better. Also discussed will be the generation and use of "Chiral Fingerprints" in HQSAR.

Section A
Moscone Convention Center

Information Sources for HIV/AIDS Research

R. Bates, Organizer, Presiding

1:10

Introductory Remarks.

1:15

20

The untidy collection of information by a journalist .Rudy M. Baum, Chemical & Engineering News, 1155 16th St., NW, Washington, DC 20036.
Journalists have different information requirements than other professionals investigating a disease such as HIV/AIDS. For a journalist, a primary source is the scientist who carried out research that is the focus of a story. The primary scientific literature and secondary sources like newspaper accounts and review articles are the background a journalist uses to prepare for interviews. In the course of the HIV/AIDS epidemic, access to sources has evolved as the disease became better known by the public and the general media.

1:45

21

HIV/AIDS information: Meeting diverse needs in a university scientific research library. M. D. O'Rourke, Blommer Science Library, Georgetown University, Washington, DC 20057.
University science libraries encounter multiple challenges trying to satisfy the essential research demands of faculty and students for HIV/AIDS information. Typical queries extend from immunochemical, pharmacological, and biochemical aspects of HIV/AIDS to health education, policy matters, ethics, epidemiological modeling, biostatistics, national and international health care economics, and business. Adding to the complexity is the need to deliver quickly, in coordination with other university libraries and research centers, evaluated, refereed, current life sciences information in many formats, at multiple campus and off-campus sites. Included in the presentation will be practical examples of HIV/AIDS inquiries demonstrating the spectrum of services a fine scientific research library must make available, plus suggestions for keeping abreast of the HIV/AIDS literature and its delivery to researchers.

2:15

22

Trends in patent information on HIV/AIDS.Andrew H. Berks, Wyeth Ayerst Research, Pearl River, NY 10965.
This talk will discuss trends in patenting behavior in the area of HIV and AIDS treatment, diagnosis, and prevention. Also discussed will be patents relevant to diseases common in AIDS patients, such as hepatitis B, Karposi's sarcoma, P. carinii. This talk will include breakouts by inventors, corporate source, nature of the invention, and regional and national trends. A comparison of patents and other literature as sources of alerting and competitive information will be discussed.

2:45

23

Computerized HIV and OI's information database systems.Mohamed E. Nasr, Division of AIDS, National Institute of Allergy and Infectious Diseases, NIK Bethesda, MD 20852.
The Division of AIDS (DAIDS) supports research to identify and develop therapeutic agents for the prevention and treatment of infections with the human immunodeficiency virus (HIV) and associated opportunistic infections (OI's) including Mycobacterium tuberculosis(TB). Computerized databases containing chemical structures and biological data have been established by DAIDS that are designed to be the most up-to- date information source on current research on HIV, OI's and TB experimental therapies. The databases are currently managed using ISISBASE and ISISHOST software of MDL Information Systems, Inc. The databases provide support for: (1) the acquisition, prioritization and to avoid duplication of testing compounds for biological evaluation in contracts operated by DAIDS; (2) to track developments through literature surveillance and abstraction of data on experimental chemotherapies of HIV and 0I's; (3) to serve as knowledge base for the NIAID and the scientific community; and (4) to prepare reviews on structure activity relationships.

3:15

24

HIV chemical information: Therapeutic agents, targets, and active sites.Charles E. Gragg, 1649 Glengarry Drive, Cary, NC 27511-5771.
Therapeutic agents for treatment of Human Immunodeficiency Virus (HIV) infection are well known by the acronyms AZT, ddl, ddC, 3TC and D4T. Further information on these Reverse Transcriptase (RT) inhibitors, and inhibitors of Protease, lntegrase, and other HIV enzyme targets expressed by the nine HIV genes can be gathered by following the chemical information.

3:45

25

AIDS information: An FDA perspective.Norman R. Schmuff, FDA, Center for Drug Evaluation and Research, Office of Pharmaceutical Science, Division of New Drug Chemistry-III, HFD-530, 5600 Fishers Lane, Rockville, MD 20857
Regulatory sources of AIDS information will be discussed from the viewpoint of the Food and Drug Administration. The range of information available through the web, email, consultants and personal contact will be discussed. A general description of the drug development process from pre-clinical through market approval will be discussed as it relates to FDA requirements for AIDS drugs. A general picture of IND and NDA requirements will be presented with an emphasis on CMC (Chemistry, Manufacturing and Controls) requirements and available guidance

Section A
Moscone Convention Center

Sci-Mix

G. Grethe, Organizer
C.E. Gragg, Presiding

8:00 - 10:30

26

Making available chemicals available.Phil J. McHale, Rebecca Franke, Gary Marquart, Richard Coad, Bryan Host, MDL Information Systems Inc., 14600 Catalina Street, San Leandro, CA 94577.
Efficient chemical sourcing is becoming increasingly important as companies strive to make their discovery processes more productive. The ability to find suppliers for required compounds and to place orders in a streamlined manner can assist in expediting chemical synthesis and biological screening. A searchable, comprehensive, well-indexed, and detailed database of chemical suppliers' catalogs with an integrated ordering function can offer significant advantages over traditional means of finding suppliers for chemicals and placing orders, and we will discuss how MDL's Available Chemicals Directory is being developed for use as a chemical sourcing tool both on the Internet via WWW and within companies' own corporate intranets.

27

CHEMCATS: Commercially available chemicals from CAS.Roger J. Schenck, CAS, 2540 Olentangy River Road, P. 0. BOX 3012, Columbus, OH 43210-0012.
To support the chemist's role in finding and synthesizing new substances, CAS began building a database of commercially-available chemicals, called CHEMCATS. CHEMCATS is international in scope and includes peptides, proteins, catalysts, polymers, inorganics, and organometallics, as well as organic chemicals. One of the goals for CHEMCATS has been to provide a quality source of information that is extremely current. Toward this end, CAS is building close relationships with catalog suppliers, accepting input to CHEMCATS in standard formats, and building the capability for weekly updates to the database. CAS is adding CAS Registry Numbers to the substances in CHEMCATS and will continue to do so to ensure seamless integration of CHEMCATS with other products and services. CHEMCATS is currently available via STN and SciFinder. Future plans for CHEMCATS, including new classes of substances to be added and new distribution mechanisms will be discussed.

28

Competitive intelligence value of patents vs. other literature sources for drug compounds.Andrew H. Berks, Wyeth Ayerst Research, Pearl River, NY 10965.
Attempts to quantify the uniqueness of data in patents, compared to other literature sources are difficult. This poster will present anecdotal cases of several drugs and outline the history of their development and public disclosure. The intent is to demonstrate that important molecules are often disclosed in patents months or years before their disclosure elsewhere. If true, this would provide evidence that unique data is present in patents that is not available, or is delayed, in other literature sources. Such unique data has substantial competitive intelligence value.

Section A
Moscone Convention Center

Online pesticide resources at the toxicology and environmental health information program.G.F. Hazard, Jr., V. W. Hudson, National Library of Medicine, Bethesda, Maryland, 20894
The Toxicology and Environmental Health Information Program (TEHIP) of the National Library of Medicine (NLM) provides online access to toxicological and other biomedical data. The databases that deliver these data contain a great deal of pesticide related information. Researchers may access these databases through the NLM ELHILL and TOXNET online systems. Recently, retrieval mechanisms based on the World Wide Web (WWW) have been developed to offer new methods of access. In this presentation, statistics and major data elements of interest to pesticide researchers will be highlighted. The resources discussed will range from a chemical dictionary file (ChemID), to secondary literature files (TOXLINE), to evaluated or peer-reviewed data files (HSDB and IRIS). The TEHIP WWW site (http://sis.nlm.nih.gov) will also be discussed. It contains background information about these online files and also points to other internet resources of potential utility to the pesticide research community.

USDA pesticide use information.V. B. Johnson, Estimates Division, National Agricultural Statistics Service, 1400 Independence Avenue, S.W., Room 5801-S, Washington, DC, 20250-2000.
National Agricultural Statistics Service (NASS), USDA, is responsible for collecting on- farm chemical use information to support the evaluation of water quality and food safety issues. The information is obtained through annual grower surveys. Published data are available from a series of surveys targeting selected field, vegetable and fruit crops in major producing States. Data are available in printed, as well as, electronic form, and text of the published reports can be accessed through the internet. The Economic Research Service (ERS) provides analytical research on the impact of alternative pesticide regulations, policies and practices. Research reports and databases are available in a variety of formats.

10:30

32

The exposure models library: A selection of fate, transport, and ecological models for exposure assessments.Paul L. Zubkoff, U. S. Environmental Protection Agency OPPTS/OPP/BPPD, Washington DC 20460-0001; Lawrence A. Burns, U. S. Environmental Protection Agency ORD/NERL/ERD, Athens, GA 30605-2720; Richard Walentowicz, U.S. Environmental Protection Agency, ORD/NCERQA, Washington DC 20460-0001.
The Exposure Models Library (EML) and Integrated Model Evaluation System (IMES) CD-ROM demonstrates the use of the CD-ROM technology for distributing exposure and assessment models, their documentation, a model selection system for use in exposure and risk assessment, and other tools. The EML: more than 100 models with source codes, manuals and data are available for determining fate and transport in various media: air, soil, groundwater and surface water. These models were developed primarily for use by various EPA offices and other federal agencies and are in the public domain. Selections of the PIRANHA (Pesticide & Industrial chemical Risk ANalysis & Hazard Assessment) model will be illustrated. The IMES: developed for exposure and risk assessors who use environmental fate models, the Selection Module assists users in choosing appropriate fate models from the user's response to queries of site characteristics and model capabilities. The Validation Module retrieves background information on models and their validation status. The Uncertainty Module compares model predictions with field data sets and presents information from the uncertainty studies using an easily understood graphical relationship. An interface for easy access to the IMES and the model directories indicates the amount of space required for downloading the model files, and allows for viewing text files in the model documentation directories.

11:00

33

Estimating fate and effects with the aquatic ecosystems model, AQUATOX.Richard A. Park1, Jonathan S. Clough2, Marjorie Coombs Wellman2, David A. MaurielIo3.
[1Eco Modeling, 20302 Butterwick Way, Gaithersburg, MD 20879-4358, 2Office of Science & Technology and 3Office of Pollution Prevention & Toxics, U.S. Environmental Protection Agency, Washington DC 20460-0001.]
Toxicity and ecological effects data can be integrated with environmental fate data to estimate trophic level responses (direct and indirect effects) of pollutants on aquatic ecosystems with the user-friendly AQUATOX model. Effects of toxic organics, mercury, nutrients, flow, sediments, and temperature are represented for complex food webs in ponds, streams, reservoirs, and lakes. Differential mortality, loss of prey, release of predation, disruptions in nutrient cycling, anoxic conditions, and changes in turbidity and sedimentation are all considered in this unique model. Steady-state and kinetic responses to single, sporadic, and chronic releases of pollutants over both short and long time periods are simulated with coupled differential equations. The risk assessor can evaluate the possible impacts of a stressor on representative ecosystems, or, because of the object- oriented code, one can easily add or delete compartments and change species and site data to simulate site-specific pollution problems. AQUATOX runs under WINDOWSTM with results presented in tables and graphs or in several database formats for export

Section B
Moscone Convention Center

The Clearinghouse for Chemical Information Instructional Materials (CCIIM).Gary Wiggins, Chemistry Library, Indiana University, Bloomington, Indiana 47405-4002.
Sponsored by the ACS Division of Chemical Information and the Special Libraries Association Chemistry Division, the CCIIM contains a collection of materials developed to assist in teaching about chemical information sources. Many of the items available from the clearinghouse are on the web (http://www.indiana.edu/~cheminfo/ cciimnro.html). In addition to locally developed items, the lists contain references to instructional materials developed by the producers of chemical information tools, many of which are supplied free by the producer. A selection of representative materials and search techniques will be presented in the paper.

35

Designing a Web page for frequently asked reference questions.Ann D. Bolek, Science-Technology Library, The University of Akron, Akron, OH 44325-3907
As the World Wide Web becomes more and more popular, more opportunities exist for creating personal home pages which can be useful to others. In the chemical information arena, questions often arise about Chemical Abstracts, other chemical databases, patents, and what useful information is available on the Web. The author will provide examples of Web pages which answer these questions in her setting. The Web pages, which can be updated frequently and easily, replace handouts and many verbal explanations of years past.

36

Searching databases to support collection development work: Tips and techniques.Grace Baysinger, Swain Library of Chemistry and Chemical Engineering, Stanford University, Stanford, CA 94305- 5080
This poster will include sources and search strategies to aid collection development and management work. By performing database searches in several key files, it is possible for chemistry, and chemical engineering librarians to better understand programmatic needs of their departments, identify newly available resources that might be of interest to their users, find out what journals their faculty and graduate students publish in and which journals they cite most frequently. While focusing primarily on printed resources, this poster will also highlight selected resources for identifying electronic resources.

37

Integration of active learning in the chemical information classroom.Nancy J. Butkovich, Physical Science Library, 230 Davel Laboratory, Penn State University, University Park, PA 16802
At Penn State CHEM 400 (The Chemical Literature) has traditionally been taught using the lecture format. During the last five years different methods of instruction have been attempted with the goal of making the course more intellectually appealing while preserving course content. As a result, several lectures have evolved into active learning modules of different types, thus allowing students to become partners in the learning process. Coinciding with this has been an effort to go beyond the "how to use this source" part of the course. Through the use of collaborative in-class exercises and homework assignments, students are presented with questions which require them to synthesize information rather than merely reciting it. Preliminary assessment of these changes has been sufficiently satisfactory to warrant revision of the rest of the course.

Redefining information access: Toward a new topology of scientific and technical information. Denise A. D. Bedford & Julie Kwan, University Libraries, William P. Weber, Department of Chemistry, University of Southern California, Los Angeles, CA 90089; Clifford Bedford, Naval Air Warfare Center, U. S. Department of the Navy, China Lake, CA 93555
Two core competencies of academic and technical libraries have been collecting information and providing intellectual access to that information. This paper builds upon a project commissioned by the Library of Congress to develop a topology of scientific and technical information systems and a field test of the topology at the University of Southern California. The topology attempts to include all types of scientific and technical information, not just those traditionally collected by libraries. Coupled with the increase of nontraditional information resources available through the Internet, the topology provides a new way to look at collection development and, through an associated interface, redefines information access for the end user. This paper focuses on an initial application of the topology and will illustrate how a World Wide Web interface could be developed using examples pertinent to chemists.

40

An American chemistry librarian becomes an Eastender. Carol Carr Chemistry Library, University of Pennsylvania, Philadelphia, PA 19104-6323
A six month job exchange took a chemistry librarian from the University of Pennsylvania in Philadelphia to Queen Mary and Westfield College in the East End of London. This poster session will outline the differences and similarities of libraries on both sides of the Atlantic.

41

From 300 baud to STN Easy: Familiarizing chemistry students with on-line literature searching from 1980 to 1996 at a Canadian undergraduate university. Brian M. Lynch, Department of Chemistry, St. Francis Xavier University, P.O. Box 5000, Antigonish, Nova Scotia B2G 2W5, Canada.
Over the past 17 years, the Department of Chemistry of St. Francis Xavier University has offered informal and formal courses aimed at developing student skills in accessing primary and secondary chemical literature sources. Many graduate students have referred to such course exposure as very valuable preparation for graduate school, and have acted as quasi-missionaries in spreading the digital word at their chosen doctoral schools. However, only about 10% of Canadian University Chemistry Departments offer similar course exposure taught by chemists, rather than by librarians. My poster will illustrate the current course status and will provide details of the form of problem assignments designed to aid in student research

42

Academic libraries in transition.Susanne J. Redalje, Chemistry Library, University of Washington, Seattle, WA 98195-1700
Libraries, as always stand on the edge of past, present, and future, generally an exciting but dangerous place to be. The future clearly includes electronic sources but will also include paper and plastic and all the other forms information has come in over the years. The University of Washington Libraries is involved in several activities, including UWired which helps freshman and others get a headstart into the electronic-world of today and tomorrow, WILLOW™ a graphical user interface; article delivery projects, locally mounted and CD-ROM LAN databases, and traditional bibliographic instruction which seek to help its users survive and thrive in this exciting and changing environment.

43

Crossfire comes to Duke.Kitty Porter, Duke University Chemistry Library, Durham, NC 27708-0355.
In October, Duke Chemistry Library traded in Beilstein Current Facts in chemistry, the Gmelin Handbook, a few other reference sources, and several journals for Crossfire- Minerva. Although the spread of its use has been hampered by the requirement for 20MB RAM for successful operation, it is gaining satisfied users among organic research groups, physical chemistry lab students, and librarians in quest of answers to questions both virtual and real.

44

Making do: Creating documentation with common tools.Andrea Twiss-Brooks, The John Crerar Library and Chemistry Library, The University of Chicago, Chicago, IL 60637
Desktop publication packages and graphic design programs are plentiful, but not always inexpensive. Potential authors with no budget for specialized software packages need not despair. It is possible to create useful, professional documentation using a handful of common desktop applications, plus a few shareware or freeware programs. This poster will describe the creation of a series of user's guides at The University of Chicago Library intended for users of the Beilstein CrossFire database searching system. The use of Microsoft Windows, Microsoft Windows Paintbrush, LView for Windows, and FTP applications will be described. (A web version of one of these guides may be found at http://www.lib.uchicago.edu/~atbrooks/beilfact.html

Section A
Moscone Convention Center

EXTOXNET: An internet pesticide information resource for non-specialists.Michael A. Kamrin, Michigan State University, East Lansing, NH 48824, Arthur Craigmill, University of California, Davis, CA 95616; Terry Miller and Jeffrey Jenkins, Oregon State University, Corvallis, OR 97331, Donald Rutz, Cornell University, Ithaca, NY 14853.
EXTOXNET is a collaborative multi-university program that is designed to provide information about the toxicology and environmental chemistry of environmental contaminants in lay language. Initial efforts of this consortium focused on pesticides and led to the development of profiles for almost 200 active ingredients and short summaries of important toxicology and environmental chemistry concepts. The profiles contain information about human and wildlife toxicology, environmental fate, physical properties and regulations governing each pesticide. This information has been published in hard copy and on the EXTOXNET WWW site. This site receives over 20,000 hits/month and is linked to a large number of other sites. The division of information into modular units describing individual concepts and chemicals will form the basis of an expansion of EXTOXNET capabilities to other chemicals and new issues.

2:30

46

Team building on the Web: An overview of coupling cost-benefit and environmental risk assessment models.F. R. Hall, Laboratory for Pest Control Application Technology, The Ohio State University, Wooster, OH 44691
Improved risk information content and clarity to pesticide uses and policy makers should enhance planning skills, benefit the policy and decision-making agenda, help defuse the current climate of crisis surrounding the use of pesticides and aid the transition towards more efficient pesticide use patterns. Pesticide use and delivery information from scarce and declining global multidisciplinary sources represent complex and expensive information. New web site linkages of these scarce global resources to promote research collaboration, data merging/sharing and building of research teams using web sites, lists, discussion groups and an overall DB sharing is discussed. This overview also discusses the range of simple lists and DB's to the more complex cost-benefit and environmental DS models. Coupling this wide array of interacting data into a meaningful DS format is a critical step for ease of strategy assessment as well as tactical implementation and the building of successful global partnerships to enhance the use efficiency of crop protection agents in agriculture

3:00

47

Flow of information in and out of a university pesticide information center. Sheila D. Merrigan, Paul B. Baker, Pesticide Information and Training Office, University of Arizona, Tucson, AZ 85719
The Pesticide Information and Training Office (PITO) at the University of Arizona provides information and education on pesticide-related issues to the public, the university community, and government agencies. Information flows into the office from many sources, including PITO staff, faculty in several departments, federal and, state government agencies, and chemical companies. Information flows out of the office in written and verbal formats including reports, bulletins, brochures, training manuals, training workshops, fairs, and telephone conversations. The PITO information center was created to help facilitate this flow of information. This paper will discuss: where and how the information center obtains information; how the information is managed, and how and to whom information is distributed.

3:30

48

The National Pesticide Information Retrieval System. Victoria J, Cassens, Eileen M. Luke, Peggy J. Hoover, Center for Environmental and Regulatory Information Systems, Purdue University, 1231 Cumberland Ave., Suite A, West Lafayette, IN 47906
The National Pesticide Information Retrieval System (NPIRS) is a collection of six pesticide-related databases available through subscription to the Center for Environmental and Regulatory Information Systems (CERIS) at Purdue University. The Pesticide Product database contains label information obtained from the U.S. Environmental Protection Agency on over 88,000 active, canceled, transferred, and suspended pesticide products. In addition, state pesticide registration data is available from 39 states. The Pesticide Document Management System (PDMS) contains bibliographic citations of documents submitted to EPA in support of pesticide registration. Other databases include EPA Chemical Fact Sheets, pesticide residue tolerance information, C& P Press Material Safety Data Sheets, and a Federal Register archive of over 115,000 documents

4:00

49

Using data from the National Pesticide Information Retrieval System (NPIRS) to assist in pesticide research. Susan E. Branchick, Carol A. Duane, Ricerca, Inc., P. 0. Box 1000, Painesville, OH 44077-1000, Victoria J. Cassens, Center for Environmental and Regulatory Information Systems, Purdue University, 1231 Cumberland Ave., Suite A, West Lafayette, IN 47906
The Pesticide Product Database and the Pesticide Document Management System Database (PDMS), available through NPIRS, can be used to assist in agrochemical research and registration. The query methods and various output options will be looked at in detail. Additionally, examples will be presented on how the information can be used in designing studies for pesticide registration, locating suppliers of technical material, identifying competitive products, determining a product's registration status and for monitoring a competitor's activities.

Section A
Moscone Convention Center

A comprehensive software system for managing NMR data.V.L. Shilay, D.F. Mitushev, A. A. Petrauskas, Advanced Chemistry Development Inc., 141 Adelaide St. West, Suite 1501, Toronto, Ontario, M5H 3L5, Canada
Advanced Chemistry Development, Inc. has developed a comprehensive software system for managing NMR data. It includes the following: 1) importing raw FID NMR data from spectrometers, 2) data processing (FT, phasing, base line correction, peak picking), 3) converting to tables of fully assigned chemical shifts and coupling constants, 4) updating to a data base which can be searched according to substructure, formula, MW, chemical shifts and coupling constants, 5) accurate prediction of new spectra based on the previously accumulated experimental data, 6) Web-based Java applets and plug-ins allowing to access the system via company intranet. The purpose of this system is to provide a complete corporate solution for NMR collecting, interpreting, searching, predicting and exchanging - all fully automated and easily customizable

9:00

51

Spectrum prediction in C-13 NMR spectroscopy: The importance of stereochemical information.W. Robien, Institute of Organic Chemistry, University of Vienna, A-1090, Austria
Spectrum prediction of C-13 NMR spectra is a versatile tool during structure elucidation. A wide range of methods including increment calculation, HOSE-code derived correlation tables and neural network technology has been described in the literature during the last four decades. The basic concepts of these algorithms will be discussed using examples from natural product chemistry, Most of the methods are restricted to a two-dimensional model of structure description neglecting stereochemical features which contribute substantially to chemical shift values. Our approach of deriving steric interactions from a 2.5-dimensional structure representation (up/down-bonds) and utilizing this information during spectrum prediction will be shown. Some useful applications based on this algorithm and also some statistical evaluations derived from our database holding 116,000 C-13 NMR-spectra will be presented

9:30

52

A 3D approach to structure-infrared spectra simulation and analysis.J. Gasteiger, P. Slezer, L. Steinhauer, V. Steinhauer, Computer-Chemie-Centrum, Universitaet Erlangen-Nuernberg, Naegelsbachstr. 25, D-91052 Erlangen, Germany.
An empirical approach to the modeling of the relationships between structure and infrared spectra is highly attractive, particularly when large molecules or large datasets have to be treated. We will show that powerful neural network techniques such as a counterpropagation network can model the relationships between structure and IR spectra. Central to this approach is a transformation of the 3D structure of molecules to a novel structure code. (1) This approach allows the simulation of IR-spectra over the entire frequency range as shown with a variety of examples. The counterpropagation network can also be used in reverse mode; by input of an infrared spectrum the 3D structure of a molecule can be predicted. The first examples of 3D structures derived directly from the IR spectrum will be given.
(1) J.H. Schuur, P. Selzer, J. Gasteiger, J. Chem. Inf. Comput. Sci. 36, 334 (1996)

10:00

53

Mass spectra interpretation by chemometric methods to support systematic structure elucidation.K. Varmuza, Dept. of Chemometrics, Technical University Vienna, Getreidemarkt 9/152, A-1060 Vienna, Austria
Computer-assisted structure elucidation of organic compounds is mainly based on NMR data. However, in many analytical problems NMR data cannot be measured because of too low concentrations and complex mixtures. In these cases MS and IR have to be used for the identification of unknowns. Chemometric classifiers have been developed for low resolution mass spectra to recognize presence or absence of substructures in a molecule. Classification is based on numerical transformation of spectral data and multivariate discriminant methods. Classification results are transformed to a good-list and bad-list for direct use by isomer generator programs. Examples demonstrate that mass spectral classifiers often provide complementary structural information to other spectroscopic data. Cluster analysis of the structure candidates gives insight into their structural diversity

10:30

54

Analytical information requirements in combinatorial chemistry.William L. Fitch, Affymax Research Institute, 3410 Central Expressway, Santa Clara, CA 95051.
The advent of combinatorial methods of molecular discovery places new demands on analytical data collection and information handling. New high throughput spectroscopic and chromatographic techniques are being developed and only the most automatable analytical measurements will be made in this environment. There are unmet needs for new methods of automated spectral interpretation and information display

11:00

55

Combinatorial chemistry - A new challenge for the spectroscopic laboratory. Reinhard Neudert, Chemical Concepts, Boschstrasse 12, D-69469, Weinheim, Germany
About two years ago, chemical research groups began to develop a modified version of the classical approach to lead discovery, that is synthesis-screening-identification- optimization. The new approach, summarized under the name combinatorial chemistry, uses the automation techniques to accelerate the synthesis and separation of candidates in lead discovery. Since the structure of the compounds is known, the process is actually reduced to one of verification of structure proposals. The classical way to verify involves human resources to a large extent. The new synthesis techniques result in the generation of tremendous numbers of new compounds. To handle the analytical task at reasonable costs, the following steps in the laboratory need to be rationalized:
- Sample preparation and data acquisition - Data management
- Archiving and automatic generation of knowledge bases - Verification

Moscone Convention Center

General Papers

C. E. Gragg, Presiding

11:30

56

Achieving database quality - With special emphasis on the delivery of chemical information. Dorothy M. Blakeslee, John R. Rumble, National Institute of Standards and Technology, 820 West Diamond Avenue, Room 113, Gaithersburg, Maryland 20899.
The Standard Reference Data Program (SRDP) at the National Institute of Standards and Technology (NIST) has long maintained a program of data evaluation, and when computer databases began to be developed, the maintenance of quality received new attention. Today, computer databases are the primary distribution mechanism for NIST standard reference data, but it is a nontrival task to make sure that what users find when using those databases is what the data evaluator intended. To that end, NIST SRDP has established a careful program of quality control to ensure NIST standard reference databases are of the highest quality. This program will be described in detail. Special emphasis will be placed on the maintenance of quality when delivering chemical information both in PC-based databases and over the Internet.

11:50

57

Data mining for lead identification and explosion. Sheila Ash, Scott Gothe, Tripos, Inc., 1699 So. Hanley Rd., St. Louis, MO 63144
Successful drug design programs need to ensure a continued flow of new lead candidates. Data mining techniques enable drug designers to capitalize on the various data sources available to them. This paper exemplifies the use of these techniques and sources for lead identification and explosion purposes available to them.

12:10

58

Combinatorial chemistry - its structure, relationships, performance and outlook.W. Gregg Wilcove, Wilcove Associates, Inc., 14 Medford Road, Morris Plains, NJ 07950
The science of combinatorial chemistry can be defined, measured, and characterized to reveal how it is organizing itself. We will present a structural analysis of its research universe that shows (1) its defining characteristics, (2) the relationships between applications, (3) where the momentum is, and (4) the emerging work that will determine its future direction, emphasis, and impact.

Section A
Moscone Convention Center

Combinatorial synthetic design.Paul A. Bartlett, Matthew A. Marx, Anne-Laure Grillot, Samuel J. Gillett, Mark R. Spaller, Eric D. Turtle, Department of Chemistry, University of California, Berkeley, CA 94720- 1460.
The simultaneous synthesis of a library of compounds must be carded out within a different set of constraints than a synthesis directed to a single target. Restrictions on isolation or purification steps, and the differing strategies for identification of individual structures after screening a library, obviate many conventional approaches to synthesis. Sequences appropriate for parallel or combinatorial syntheses begin with starting materials that are available with diverse functionality; they are relatively short, and, in many instances, they are carried out on solid support. It is also generally the case that a single variable is introduced in any step. In addition to these criteria, it is our own prejudice that cyclic or conformationally constrained molecules offer the most interesting targets for development of library syntheses and as screening leads. Synthetic sequences devised on the basis of these principles will be described

2:00

60

Information requirements for planning a compound library. Guenter Grethe, Maurizio Bronzetti, MDL Information Systems, Inc., 14600 Catalina Street, San Leandro, CA 94577.
Careful planning is the most critical step in the process of generating libraries of small organic molecules. After identifying a biological target and generating a potential scaffold for the desired library, the possible metabolic fate of compounds to be synthesized has to be considered and viable synthetic pathways amenable to combinatorial synthesis have to be developed. Facile access to relevant information about synthetic methodologies in solution as well as on solid support is essential. The planned synthesis and the desired diversity of the library influences the selection of starting materials. Effective selection from available sources is critical. We will demonstrate the planning process and the efficient use of information resources for the generation of a library of small organic compounds.

2:30

61

Mining a reaction database to support combinatorial synthesis. Glenn J. Myatt, Paul E. Blower, Jr., Mike Petras, Chemical Abstracts Service, 2540 Olentangy River Road, P. 0. Box 3012, Columbus, OH 43210-0012.
CAS has analyzed the CASREACT reaction database in terms of functional group transformations. The results are presented in tabular form which a chemist can browse to find promising reactions for combinatorial synthesis of small organics. This paper will give details of the analysis and describe a computer interface that provides tools to navigate the analysis tables and reaction sets.

3:00

62

Information management for automated parallel synthesis.David Nickell, S.H. DeWitt, E.M. Hogan, Diversomer Technologies, Inc., and Parke-Davis Pharmaceutical Research, Division of Warner-Lambert Co., Ann Arbor, MI 48105
The ability to rapidly screen many thousands of chemical entities in high throughput biological assays has raised the issue as to how the necessary number of compounds will be obtained for testing. The problem has been approached from a number of directions by different organizations. Managing the information necessary to automate these systems at the enterprise, laboratory, or desktop levels will require creative solutions which may involve paradigm shifts in the way chemical synthesis is performed in the laboratory. The development of fully integrated systems to address the growing demand for automated organic synthesis is an area which is receiving much attention. The first generation information systems will support individual automated synthetic workstations. Later systems will include interfaces to pick-and-place robots, proprietary reaction equipment, purification and analysis tools as well as existing databases. Modularly designed information systems will have the capability to grow with the expanding needs of the automated synthesizers.

3:30

63

Data management for combinatorial technologies at Selectide/HMR Inc.R.F.D. Stansfield, J.D. Heddles, C.V. Summers and K.F. Wertman, Selectide Corporation, Hoechst Marion Roussel, Inc., 1580 East Hanley Blvd., Tucson, AZ 85737-9525.
Combinatorial technologies at Selectide are applied to synthesis, analysis and screening for biological activity. Automation in synthesis and in screening is key. The data management requirements cover the gamut of a traditional pharmaceutical research organization - product management, test and results management, and decision support - but with an additional, combinatorial slant. Our approach is based on an analysis of the lead generation and optimization processes and a judicious selection of commercial software tools. We are currently building applications and databases which provide integrated views of information for chemists and biologists. The integration is done across different products and technologies for managing, respectively, relational (alphanumeric) data, discrete structures and combinatorial libraries.

4:00

64

Reaction-centered informatics for combinatorial chemistry.David Chapman, Afferent Systems, Inc., 442A Collingwood St., San Francisco CA 94114.
Combinatorial libraries may be best represented as a series of steps, each of which either distributes a set of reactants or performs a reaction. This representation is used advantageously both to generate product structure databases, and to drive synthetic instrumentation, in a tightly integrated system: Myriad. Database generation proceeds by "virtual chemistry" simulation of the actual synthesis. Virtual chemistry generates all and only the expected products of chemistries such as cycloadditions, which the familiar "generic structure" approach cannot. Because virtual reaction vessels correspond with physical ones, including physical product in the database is easy. Myriad includes a high- throughput, instrument-independent synthesis controller, which transforms a graphical library definition (consisting simply of sets of reactants and a series of reactions) into the thousands of robot actions needed to actually make the library. It can increase instrument throughput several-fold by interleaving multiple product batches.

4:30

65

Searching for patents in Derwent's World Patents Index on combinatorial chemistry processes and products.Donald W. Walter, Derwent Information, Suite 525, McLean, VA 22102.
How can you search for patents on a particular combinatorial chemistry product or process when the target patent may involve a mixture of millions of amino acid, nucleotide or other sequences? Derwent's indexing of chemicals provides a way to focus on the particular combinatorial products of interest, and the flexibility to search narrow questions or broad ones. This paper will present Derwent's indexing practice and philosophy on the subject, and illustrate some techniques for searching for patents involving combinatorial chemistry

Section A
Moscone Convention Center

Experience with ChemSpace (TM): Finding one compound among a billion.Richard D. Cramer, David E. Patterson, Robert C. Glen, Allan M. Ferguson, Michael Lawless, Peter Hecht, Tripos, Inc. 1699 So. Hanley Rd., St. Louis, MO 63144
New informatics paradigms are required to exploit the orders of magnitude increase in the accumulation of qualitative SAR data. Our focus on molecular similarity, using several metrics "validated" as predictive of similar biological properties, has yielded, among other things, techniques for searching large combinatorial "virtual libraries" at rates over 100,000,000 structures/hour. This capability suggested the possibility of a "universal database," containing "all" structures available in a few steps from commercially available reagents. Structures identified within this database using validated similarity metrics would be the most logical candidates for following up a newly discovered hit from a random screening program. Some results from the first nine months of experience with such a database will be presented and discussed.

9:30

67

New leads by selective screening of compounds from large databases.Alberto Gobbi, Dieter Poppinger, Bernhard Rohde, Ciba Geigy AG, R-1045.1.20, Postfach, CH-4002 Basel, Switzerland.
At Ciba, a large database with over 500,000 commercially available compounds was built. Several methods to select compounds for screening from this database have been compared using an existing dataset including biological activity. Using a genetic algorithm many of the most active compounds were found screening only 1,200 out of 76,000 compounds

10:00

68

SCAM: Statistical Classification of Activities of Molecules using recursive partitioning. Andrew Rusinko III, Mark W. Farmen, Christophe G. Lambert, and S. Stanley Young, Research Information Resources, Glaxo Wellcome Inc., Research Triangle Park, NC 27709
Combinatorial chemistry and high-throughput screening have revolutionized the drug discovery process in the pharmaceutical industry. Large numbers of structures and vast quantities of biological assay data are rapidly being accumulated which overwhelm traditional chemical/biological analysis technologies. Recursive partitioning is a method for statistically determining the rules that classify objects into similar categories or, in this case, structures into groups of active or inactive molecules. SCAM is a computer program designed to make use of this methodology in an extremely efficient manner. Rules explaining biological data for thousands of compounds can be determined in a matter of a few CPU minutes. A dataset of 1,650 monoamine oxidase inhibitors was used in this investigation. Substructural rules that lead to a general classification of structures were obtained and compared to clustering of structures via their aggregate chemical descriptors alone. Advantages and disadvantages of this methodology are presented.

10:30

69

Data mining using probabilistic structure analysis.James A. Morrell Monsanto Co., GG3K, 700 Chesterfield Village Pkwy, St. Louis, MO 63198
The presentation describes a technique we are developing for utilizing data in either commercial or proprietary chemical information databases to determine a probabilistic measure of how various functional groups impact biological activity and specificity. A precursor of the technique, which has traditionally been used for capturing an organization's collective knowledge of toxicological activity, has been extended to provide an additional tool for combinatorial library design. With regards to combinatorial library building, the technique has potential as either a pre-processing (building block selection) or post-processing (library selection) step.

11:00

70

Experimental techniques for the datamining of CAS data for substance-use relationships.W. Fisanick, T. E. Bangert, Research Unit, Chemical Abstracts Service, 2540 Olentangy River Road, P. 0. Box 3012, Columbus, OH 43210
CAS is experimenting with a variety of techniques for the data mining of Registry and CA File data for substance-use relationships. Included are enhanced similarity and clustering techniques for structure data and techniques that extract, summarize, and infer substance use information from text data. The structure similarity capabilities incorporate a composite or class similarity search for a set of substances. The structure clustering techniques include a Jarvis-Patrick method that has been enhanced with a screening mechanism to improve the compute performance and two cluster relationship techniques that allow for related cluster/substance navigation and cluster overlaps. A partitioning technique based on common substructures is also used. The common substructures are identified in a series of similarity and substructure searches. Of significance in the text handling is an initial version of an inference technique that establishes the substance to use correlation. This paper will discuss and illustrate selected techniques and their experimental use for a particular type of use such as a class of bioactivity

11:30

71

MineSet: An interactive data analysis and exploration toolset.Mario Schkolnick, Silicon Graphics Computer Systems, Mountain View, CA 94043-1389
MineSet is a new data mining and visualization product from Silicon Graphics. By integrating the functions of data access, data transformation, data analysis and presentation of results, the task of mining data is supported in a very interactive way. This talk will discuss the organization of the product and will demonstrate its main features.