2
Linear Notations Represent the atoms, bonds, and connectivity as a linear text string SMILES –Concise –Orignally designed for manual command line entry into text-only systems –Now widely used Can be input to a spreadsheet cell, on one line of a text file, or in an Oracle database text field System to generate canonical form of SMILES

4
SMILES Review (cont’d) Can make Hydrogens explicit Non-organic atoms are put in square brackets, e.g., [Xe] Charged species also in square brackets with a + or -, e.g., [Na+] or [O-] Unknown atoms indicated by a * Stereochemistry represented by

5
SMILES for Tyrosine NC(Cc1ccc(O)cc1)C(=O)O

6
SMILES FOR Acetaminophen (Tylenol) O=C(O)Nc1ccc(O)cc1

7
SMILES for Isatin O=c2[nH]c1ccccc1c2=O

8
Canonicalizing SMILES – Morgan Algorithm Each atom has a connectivity value: how many atoms it is connected to That value is replaced by the sum of the connectivity values of the its neighbors Continues iteratively, until number of different values is maximized Atoms are numbered in decreasing order of connectivity value –In case of a tie, other properties are used (e.g. atomic number, bond order, etc).

9
Canonicalizing SMILES – CANGEN Two-stage procedure used by Daylight First stage CANON, generates a canonical connection table using a modified version of the Morgan Algorithm that produces a tree structure Second stage GENES creates a unique SMILES using a depth-first search of a the molecular graph tree output by CANON More information – JCICS 29,1989,97-101

11
Simple Reaction SMILES Each reagent and product represented as SMILES Reagents on the left of a “>>”; products on the right Individual reagents and products are separated by a “.” CH 4 + 2O 2  CO 2 + 2H 2 O Reaction SMILES: C.OO>>C(O)O.O

16
Representing generic structures A generic structure is one which, by ambiguity, represents a (possibly infinite) set of possible structures Ambiguity usually takes the form of “R” groups Originally used for representing patents Now used for representing combinatorial libraries too Also known as Markush Structures

17
Specifying a substructure query with SMARTS SMARTS: a superset of SMILES extended to allow partial structures (substructures) and optional parts of molecules to be represented Simple example *C(=O)O where the * represents an attachment point (i.e. any number of any atoms) More information: –http://www.daylight.com/meetings/summerschool01/course/basics/ smarts.htmlhttp://www.daylight.com/meetings/summerschool01/course/basics/ smarts.html –http://www.daylight.com/dayhtml/doc/theory/theory.smarts.htmlhttp://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

19
SMARTS examples [!C;R]Any atom in a ring that is not aliphatic Carbon [O;H1]Hydroxyl group (-OH) c:cTwo carbons separated by aromatic bond C~NCarbon and nitrogen attached by any bond *C(=O)OCarboxyl Group

20
Try out a SMARTS search DepictMatch: –http://www.daylight.com/cgi-bin/contrib/depictmatch.cgihttp://www.daylight.com/cgi-bin/contrib/depictmatch.cgi Enter a set of SMILES and a SMARTS, and any part of the SMILES that is found in the SMARTS is highlighted As an example, we’ll use the sample dataset described on the following two slides, and use *C(=O)O (carboxyl group) as our SMARTS and RC(=O)O (carboxyl attached to a ring)

26
Measuring similarity between molecules Similar Property Principle: “Molecules with similar structure are likely to have similar biological activity” Generally the Tanimoto Coefficient or Euclidean Distance between fingerprints is used