1. Introduction

The CLOGP Reference Manual will help the user understand why the CLOGP
program calculates logP(ow) the way it does. Although the procedure cannot
be derived from first principles, we have tried to make the rules consistent
with solvation theory, if for no other reason than they are more easily
remembered. The method simply adds together values for structural parts
of a solute molecule and correction factors dependent upon the particular
way the parts are put together.

The CLOGP EXAMPLES section contains example CLOGP calculations for
a variety of chemicals and is designed as a companion to the CLOGP Reference
Manual. An asterisk (*) appears in the CLOGP Reference Manual when one
or more examples are provided in CLOGP Example Calculations to
illustrate aspects of the CLOGP computation. Examples also demonstrate
DEPICT (chemical depiction).

Funding for the development of CLOGP was provided by the U.S. Environmental
Protection Agency through Cooperative Agreement No. 809295, and we wish
to acknowledge the encouragement and support of the project officer, Dr.
Gilman Veith of ERL-Duluth.

1.1 Measurement and Past Uses of Partition Coefficients

The partition coefficient is the equilibrium concentration of solute in
a non-polar solvent divided by the concentration of the same species in
a polar solvent. In this and most other applications, the polar solvent
is water. The logarithm of the partition coefficient, log P, has been successfully
used as a hydrophobic parameter in 'extrathermo-dynamic' Hammett methodology.
1-octanol has much to recommend it as the choice for the non-polar phase
(1) and logP(ow) has been used successfully in Quantitative Structure Activity
Relationships (QSAR) in the following special fields: drug and pesticide
design (2,3); pharmacokinetics (4); anaesthesiology (5); environmental
transport and soil binding (6,7); toxicology (8); bioaccumulation (9);
protein folding (10); enzyme binding (11,12); enzymic reactions in non-aqueous
solvents (13); and host-guest complexation (14a,b).

In principle, the measurement of the equilibrium concentration of solute
in the octanol and water phases, after shaking in a separatory funnel,
is very simple, and since good measured values are always to be preferred
over calculated ones, it would seem that there should be little need for
a procedure to calculate them. As it turns out, reliable shake-flask measurements
are time-consuming and often difficult to make. The criteria for high reliability
are: measurements over a 10-fold concentration range (with upper concentration
no more than 75% of solubility or CMC, or no more than 5% of the aqueous
phase, whichever is lower) and standard deviation of 0.03 or less in log
terms. This often requires working at sub-micromolar concentrations, and
so, with either UV spectrophotometry or gas chromatography, it means that
the standard curves must be established with utmost care. Radiotracer methods
seem well-suited for analyses at these low concentrations, but impurities
as well as adsorption at phase boundaries (including container walls) can
introduce significant errors.

HPLC procedures provide a way around this bottleneck(15,16) and can
save time if there is a limited variety of structural types, and the log
P values fall in the range of 0.5 to 4.0. Most HPLC procedures which are
used to develop log P(ow) values do no use octanol and thus have to be
referred to that system by standard curves which can be different depending
on whether the solutes do or do not contain certain basic fragments, such
as pyridine nitrogen. If the solutes do not absorb well in the UV, difficulties
in detecting the elution time eliminates any advantage HPLC may have over
the shake-flask method.

Procedures which employ filter probes (17) or solubility columns(18)
speed up partition coefficient measurements by eliminating centrifugation
as the means of phase separation. However, each has its own set of disadvantages
and limit its acceptance as a method for establishing the standard values
for a calculation procedure.

More efficient methods of measurement of octanol/water partition coefficients
are certain to be developed in the future, but no conceivable 'breakthrough'
is likely to eliminate the need for logP calculation. To put the problem
in proper prospective, one need only imagine some dedicated synthetic chemist
making all possible tri-substituted benzoic acids with the methods commonly
available today. When finished, there would be five million analogs for
which partition coefficients could be determined. And of course only by
calculation is one going to have an estimate of hydrophobicity before synthesis.

The Pomona MedChem Project saw these and other arguments as reason enough
to develop a method to calculate log P(ow) from structure by an additive-constitutive
procedure. As it turns out, the 'constitutive' portion of the procedure
was, by the very nature of the two competing solvation equilibria, very
complex, and the manual method required considerable effort before it could
be applied with confidence. It is the aim of the second-generation program,
CLOGP, to take most of the routine calculation burden from the user but
still encourage him to study the interplay of hydrophobic and polar solvation
forces which can be so crucial to the design of bioactive chemicals.

1.2 How to Understand CLOGP Calculations

The first published method for calculating log P(ow) from structure (19)
was based on a 'substitution' procedure and was developed with substituent
pi constants for aromatic rings in mind. Of course this method was limited
to deriving a new log P from a 'parent' structure whose log P was already
known. Rekker(20) was the first to publish a procedure which was more general
in that it assigned 'fragmental constants' to a variety of structural pieces,
and the calculated log P was the sum of the values appropriate for the
molecule in question. The original pi system can be expressed as:

while the expression for Rekker's fragment system is:

The method developed by Pomona MedChem (21) follows Rekker's general
formulation, but there are some important differences in the approach used
to derive the actual working constants. Rekker used a 'reductionist' approach
- deriving the constants for carbon and hydrogen as well as those for polar
fragments from a statistical treatment of a large body of log P data which
contained numerous interaction factors. Both the fragment values (f) and
interaction factors (F) had to be identified and evaluated concurrently.
Also, Rekker neglected to clearly define just what constitutes a fragment.
Instead he provides a table in which the known constants can be found (see
footnote). Rekker also treats all correction factors as some multiple of
a 'Magic Number' (+0.28), but the selection of multiples was not made clear
in his published work. Although his method gained some acceptance for manual
calculation, we considered it too seriously flawed to serve as the basis
for a computer method.

In order to construct a dependable, verifiable algorithm suitable for
log P calculations used in developing QSAR at Pomona College, we first
elected to clearly define what constitutes a fragment. Next we chose a
'constructionist' approach to evaluate them; that is, we accepted as axiomatic
that the hydrophobic portions of solutes were those most 'hydrocarbon-like',
and defined these carbons and hydrogen fragment values as being truly constant.
We gave very heavy emphasis to the carefully-measured values for three
solutes; molecular hydrogen, methane and ethane, because from these we
could derive fragment constants for carbon and hydrogen, which would be
free of obscuring interactions. For all hydrocarbon structures more complex
than these, whose measured values were NOT the sum of fragment values,
we attempted to define the difference in terms of universally-applicable
correction factors. It appears that this approach has led not only to a
workable algorithm, but has highlighted the importance of certain types
of polar solvation forces which have received insufficient attention in
the past.

The first attempt to reduce the 'Pomona Method' of log P calculation
to computer algorithm was made in collaboration with Dr. Jack Chou and
Dr. Peter Jurs of Pennsylvania State University (22). It was called CLOGP.
A great deal was learned in the process of developing this first version,
and it certainly established the real need for a 'stand-alone' program
to make these calculations. Nevertheless, CLOGP was difficult to install
and modify, and many well-known correction factors could not be implemented
due to programming difficulties. In light of this experience, we deemed
it essential to incorporate, in the second generation program, design features
which would encourage its continuing evolution. To achieve that objective,
the program had to be conceived as a 'modeling system' which could operate
from one or more easily-revised 'value files'. As an example of the ease
of updating, the largest fragment encountered to date is:

It took less than two minutes to enter it into the database and begin
to use it in calculations.

A number of significant improvements have been made to CLOGP over the past few years. The most significant improvement is the ability to estimate a polar fragment value which has not appeared in a solute having a measure log P (oct). This type of estimation is designated as 'calculated'. If the fragment in question has appeared in a measured solute, but in a different bonding environment (e.g., aromatic attached when aliphatic attached is needed), the new 4.0 version allows for all extrapolations and designates the value as 'derived'. The methodology for the 'No Missing Fragments' algorithm is explained in more detail in ref. 23.

Earlier versions of CLOGP were able to assign corrections to polar fragments interacting over two Isolating Carbons (see Section 2. following). In the latest version, this distance has been extended to three I.C.s. A significant improvement in steroid calculations recognizes the unique contribution of polar groups at the 11 position as well as some long-range intramolecular hydrogen bonds. These and a few other minor improvements will be noted in the changed values in the CLOGP Example Calculations, 6.3 .

2. Fundamental Fragments

In view of the decision to make alkane carbons and hydrogens the most fundamental
fragments in the system, it is necessary to define these very carefully
before defining the polar, more hydrophilic fragments.

2.1. Isolating Carbons

An 'Isolating Carbon' (I.C.) atom is carbon which is NOT doubly- or triply-
bonded to a hetero atom. An I.C. may be bonded to a hetero atom by a single
or an aromatic bond. This definition can be made clearer from the following
two examples:

In an earlier version of the manual calculation procedure (Ref. 21 p.
34) the Kekule structures for pyrimidine was considered; the earlier rule
is now superceded. In coumarin, both rings are designated as aromatic,
and the only carbon which is not isolating is the one in the carbonyl group,
because it is doubly bonded to a hetero outside the ring.

Although the hydrophobic value of an I.C. is constant, several types
must still be identified; the degree to which they delocalize electrons
in any polar fragments attached to them has a great influence on overall
log P. The types of I.C.s presently identified are listed below with appropriate
symbols:

Symbol

Type

A

Aliphatic

Z

Benzyl

V

Vinyl

Y

Styryl

a

Aromatic

To be completely characterized, a polar fragment must have each of its
'valence bonds' designated with one of the above symbols (see section "Fragment
Valence Types"). The numerical value of the fragment will increase roughly
in the order 'A' to 'a', but must be experimentally determined for high
reliability.

All hydrogens bonded to I.C.s are fragments. These two kinds of fundamental
fragments are the most important members of the non-polar class. A comparison
of their relative values (C = 0.2; H = 0.225) is a reminder that the measure
of effective cavity size may not be as simple as using van der Waals radii
or CPK models.

2.2. Polar Fragments

A fragment is any atom or group of atoms bounded by Isolating Carbon atoms,
and all except hydrogen are considered polar. A fragment may have many
internal bonds but those connecting it to I.C.s are called 'valence bonds'.
Valence bonds are most often single, but can be aromatic, as in the case
of the N fragments in pyrimidine shown above. Each hydrogen in methane
is a fragment, but the hydrogens in formaldehyde are not because the carbon
to which they are bonded is not isolating. This is very important to remember,
for one frequently sees published calculations in which one fragment value
is obtained from another by the replacement of a fragment hydrogen with
another fragment of known value. At the present time a good rule to follow
is: "Never break up a Fragment; estimations can be made from values measured
for different bond environments (see below) but a Fragment cannot be constructed
from parts." Examples of fragments which cannot be "broken down" further
are:

Monovalent:

-Cl;-CN

Divalent

-OC(=O)NH-

Trivalent:

-OC(=O)N<

Tetravalent:

>NC(=O)N<

As will become evident in the following sections, polar fragments can interact
in various ways. To quantitate this interaction it is necessary to define
several types of polar fragments:

(A) X = any halogen, but for one type of interaction fluorine must be
assigned to a special subclass,'F'.
(B) Y = all non-X fragments; these are further subdivided according
to:

sensitivity to halogen interaction as 'Y-1', 'Y-2', and 'Y-3'
containing '-OH' or not.

2.3. Intrinsic Values

Here the term, 'intrinsic' fragment values, means those which would, if
summed, yield the correct log P without any correction factors. It is worthwhile
to examine some of the accepted hypotheses as to what solvation forces
or other phenomena determine these intrinsic values.

It takes more energy to form a cavity in water than in octanol. One
would predict, therefore, that increasing the size of a solute would increase
its log P. Other factors being equal, this appears to be the case. However,
other features of the solute can partly or completely override the effect
of its size.

Water is much more capable than is octanol of accommodating localized
dipoles, and it contains, on a molar basis, more hydrogen bond accepting
and donating groups. So it is these three factors -size, localized dipole
strength, and H-bonding ability - which largely determine the sign and
magnitude of any fragment value.

2.3.1 Halogens

Halogens form an intense localized dipole when bonded to an aliphatic carbon
atom.* This intensity is somewhat lessened if the I.C. is benzyl and greatly
lessened if it is vinyl, styryl or aromatic.* Fluorine has a negative fragment
value when attached to an aliphatic carbon, because the dipole effect outweighs
the effect of size. Size is of greater importance with chlorine and bromine,
but even bromine is less hydrophobic than a hydrogen in an aliphatic setting.
As will become evident in the following section, much of this hydrophilic
polar effect can be lost through 'shielding' by other halogens, or by electronic
interaction with 'Y' type polar groups.

2.3.2 H-Polar Fragments

H-Polar Fragments ('Y') almost universally form some sort of hydrogen bonds
with the donor (H) or acceptor (O) of the aqueous phase. This is thought
to interrupt the peculiar 'ice-like' water shell which forms around the
non-polar, hydrocarbon-like portions of each solute molecule, and thereby
effectively reduces cavity size. As noted above, this should reduce log
P.

2.3.3 Ions

Octanol can accept some larger solutes containing a full formal charge
in sufficient concentration for measurement. However, one must be careful
that the species measured is the same, because water easily supports complete
ionization while ion-pairing is the usual condition in octanol except at
the very lowest concentrations. Consistent values can be obtained if 'standard
conditions' are adhered to: 0.1 M small counter-ion (Na+ or Cl-) and extrapolation
to infinite dilution. Measured in this way a carboxylate ion is about 4.1
log units lower than the undissociated acid. No single value can be given
to the positive charge on a protonatedamine or quaternary ammonium, because
the charge is delocalized along the hydrocarbon chain and thus the effect
is dependent on chain-length. It should be emphasized at this point that,
except for zwitterions, CLOGP calculates values for the neutral solute
only.*

2.3.4 Unsaturations

Double bonds in isolation have a slightly negative effect on log P.* This
effect may arise from the polarity of the pi electrons or else it may be
due to the shorter bond length reducing cavity size. At any rate, it disappears
if the double bonds are conjugated.* Triple bonds are decidedly hydrophilic
and require a large negative correction factor.

3. Correction Factors

3.1 Structural Factors

3.1.1 Bonds
To properly perform its calculations, CLOGP needs to know the number and
types of certain bonds in the solute structure. There is some reason to
believe that, for the bonds in question, factors other than bond length
affect the size of the solvent cavity needed to contain the hydrophobic
portions of the solute molecule.

The effect of all bonds within any fragment is taken care of by the
fragment value, and so it is NOT necessary to keep track of them, NOR of
any bonds to hydrogen. And, as explained below, it is convenient to allow
for the bond effect in aromatic rings by including it in a special aromatic
I.C. type, 'aromatic carbon'; therefore, aromatic bonds also are NOT given
special attention.

Bonds which DO need to be identified are the following:

3.1.1.1 Chain bonds

Chain bonds are non-ring bonds between I.C.s plus any valence bonds to
fragments*.

3.1.1.3 Branch Bonds

A separate count of each of these bonds types must be made, and a negative
correction applied. For chain bonds only, this correction applies to bonds
AFTER the first in each chain. For example, there is no net bond correction
for ethane but there is one for propane. This suggests that the correction
accounts for flexing of the chain which is not possible in methane or ethane.
1,2-diethylbenzene gets only a net of two bond corrections, because each
chain is counted separately. Also compatible with a 'flexing' hypothesis
is the fact that the correction is greater for chains than for aliphatic
rings.

As noted above, an isolated double bond is assigned a negative correction
factor (-0.09). This factor actually becomes slightly positive if the double
bonds are conjugated in a ring such as benzene. Since it is much more convenient
to assign all bonds in large fused ring systems as aromatic type, rather
than using the Kekule system of alternating doubles and singles, it is
worthwhile to assign a special fragment value to an aromatic carbon and
include all the necessary bonding effects therein. The value of aliphatic
carbon is +0.20;the value for aromatic carbon which includes all bond effects
associated with the aromatic ring system is +0.13.*

3.1.2 Branching at Isolating Carbons

3.1.2.1 Chain Branch

It is well-known that iso-alkanes are more water-soluble than their n-isomers.*
This branching evidently does not produce a corresponding solubility increase
in the octanol phase in the partitioning process, because the correction
required in CLOGP is negative in sign.

In CLOGP, the concept of branching was expanded, and now replaces the
earlier use of the 'ring' cluster' correction(21). Fusion carbons in non-aromatic
rings are considered as branched and given the same correction factor as
chains; i.e., -0.13.* They are designated cluster branches.

3.1.2.2 Group Branch

If an H-Polar group branches from an I.C. the increase in water solubility,
compared to the n-isomer, is even greater than with chain branching. Again
this carries over to partitioning equilibrium; H-Polar group branching
requires a larger correction than does chain branching. For this reason
isopropyl alcohol is given one group branch correction and no chain branch
is considered.* Tertiary butyl alcohol gets one of each type.*

If a fragment has more than two external(valence) bonds, it could be
considered a branching point. However, in all cases except the 'Branched
Fragments' noted above (t-amines and phosphate esters), the entire negative
branching effect is included in the fragment value itself. Only in the
case of the 'Branched Fragments' is the effect chain-length dependent.

3.2 Interaction Factors

3.2.1 Aliphatic Proximity (Measured Topologically)

3.2.1.1 Halogen vs. Halogen (X vs. X)

The positive correction to log P for this interaction is thought to result
from dipole shielding and is limited to halogens on the same (geminal)
or adjacent (vicinal) I.C.s. The geminal interaction is designated 'X-C-X',
and the corrections can be thought to arise as follows: adding a second
halogen to an I.C. which already has one creates the first X-C-X pair,
and the correction required is +0.60.* Adding the third halogen to the
same I.C. creates two more such paintings, each of which requires a correction
of +0.5.* If the fourth halogen is added, the dipole is almost completely
shielded, and the three additional pairings require corrections of +0.40
each.* For carbon tetrachloride the total geminal halogen correction would
be:

EQ on pg. 12

For the vicinal halogen correction, X-C-C-X, the bond between carbons
must not be double.* The correction is evaluated by subtracting one from
the number of halogens meeting the structural requirement and multiplying
by the factor 0.28. (Again Rekker's Magic Constant pops up!)

3.2.1.2 H-Polar vs. H-Polar (Y vs. Y)

As noted in the section on 'Intrinsic Values', the negative sign on the
'Y' fragments is thought to result from their 'structure-breaking' (and
thus cavity-reducing) ability in the water phase. 'Y' fragments appear
to eliminate the cavity requirement for two or more I.C.s to which they
are attached. Obviously if two 'Y' fragments are located on the same or
adjacent I.C.s to which they are attached. Obviously if two 'Y' fragments
are located on the same or adjacent I.C.s some of this cavity reduction
is going to be counted twice. Thus a positive correction factor is called
for when the topological separation is less than three I.C.s. The CLOGP
algorithm is an improvement over the original Rekker procedure(20) in that,
in place of the same correction for every Y-C-Y or Y-C-C-Y, it makes the
correction proportional to how much hydrophilic character (negative fragment
value) is involved. This proportionality appears to apply even if one
of the fragments is charged and has a highly negative value, but as previously
noted, the CLOGP algorithm currently does not treat ions.

If one of the 'Y' fragments in a Y-C-Y interaction contains an -OH moiety
(e.g. -NHOH, -COOH, or -OH itself), a greater proportion of the hydrophilic
character of the pair is lost. The coefficient by which the fragment sum
is multiplied increases from 0.32 to 0.42.*

If both the 'Y' fragments and the carbons of Y-C-C-Y are in a ring,
the hydrophilicity loss is not as great as if they are all in a chain (coefficient
0.26 vs. 0.20).* If one 'Y' is a substituent on the ring while the other
is in the ring, the correction coefficients are averaged (0.23).* If one
of the carbons has two 'Y' fragments, the geminal correction is applied
first; then, for the (Y-C-C,Y'Y'') correction, both pairings are calculated
and averaged.* If both I.C.s have geminal 'Y' fragments, then the vicinal
correction is not applicable.* (See penicillin in EXAMPLES).

3.2.1.3 Halogen vs. H-Polar (X vs. Y)

The interaction being considered at this point is limited to that which
takes place across single bonds. It is therefore, probably due to an inductive
or field effect. (The electronic interaction between fragments on or in
aromatic rings is discussed in the following section.) In evaluating the
X-C-Y correction factor, all halogens can be treated alike. However, there
are at least three levels of sensitivity shown by 'Y' type fragments. In
CLOGP the most sensitive class, 'Y-3', is restricted to the structural
type: -SO2-R.* 'Y-2' consists of the types: -CONH-R, -O-R, -S-R, and -NH-R;
and 'Y-1' of all other, H-polar fragments.* The correction factor for the
first alpha-halogen (i.e. X-C-Y) is the same for all three Y-types (+0.9).
For 'Y-3' fragments, this correction factor is doubled when there are two
alpha halogens (X{2}-C-Y3) and tripled when there are three (X{3}-C-Y3).
For 'Y-2' fragments, the second and third alpha-halogens need much less
correction, and for 'Y-1', virtually none. In the case of multiple halogenation,
the X-C-X and the X-C-Y corrections are additive.

The CLOGP algorithm makes no separation of 'Y' types to make the X-C-C-Y
correction, but needs to distinguish fluorine from the other 'X' halogens.*

3.2.2 Electronic (through Pi-bonds)

3.2.2.1 Fragment Valence Type

As previously noted, all fragments (X or Y type) are assigned the most
negative values when bonded to aliphatic I.C.s (designated as 'A'). This
can be considered as the 'base' or 'intrinsic' level. If the fragment value
when attached to a vinyl I.C. (V) has not been measured it can be estimated
as the average of the base and aromatic-bonded (a) values.* Likewise the
value for the styryl-attached fragment can be estimated as two-thirds the
way from the base to the aromatic value.* CLOGP will only make these estimations
when measured values have not been entered in the database.

3.2.2.2 Extension of Aromaticity

The extension of the aromatic ring system through fusion (as in naphthalene
or direct substitution (as in biphenyl)) appears to increase log P, especially
if the heteroaromatic atom is next to the juncture.* If the ringjoining
carbons are attached only to other aromatic carbons, electron delocalization
is minimal and so is the correction: +0.10 for each I.C. If the I.C.s are
also attached to a polar (fused-in) fragment, such as in quinoline or 2-phenylpyrimidine,
the correction is greater, +0.31.*

3.2.2.3 Sigma/Rho Fragment Interaction

When two or more X and/or Y type fragments are attached to an aromatic
ring system, the correction factors can be calculated by a method very
similar to that used by Hammett(23) to calculate the electronic effects
in other equilibria, such as acid ionization.* This requires the assignment
of a measure of electronic 'strength' (sigma) and 'susceptibility' (rho).
In dealing with electronic effects on partitioning equilibria, a few fragments
appear to act 'bidirectionally' and require both sigma and rho values,
although they cannot, of course, act upon themselves. Most of the details
of sigma and rho assignment to X and Y type fragments can be found elsewhere
(24), but it should be pointed out that the latest version of the program
(CLOGP) follows a newer procedure for 'fused-in' fragments. Fragments
fused in aromatic rings(e.g. -N= or >C=O) may also be assigned both rho
and sigma constants and treated together with 'on-ring' fragments instead
of requiring a separate treatment.*

Fragments on different rings in an aromatic ring system interact with
one another but the effect is attenuated.* If the two rings in the system
are fused, as in 5-acetyl-1-naphthylamine, the 'intrinsic' effect is only
half the sigma rho product just as it is in the biphenyl system such as
in 4-(m-chlorophenyl)aniline.

One frequently encounters aromatic ring systems containing several fragments
with rho and sigma values assigned, and the potential correction from all
cross products of sigma/rho could be very large. Since these multiple effects
are NOT additive, some scaling down procedure was indicated. The one chosen
for CLOGP takes the following steps:

a) The full potential sigma/rho product for each possible interaction
is calculated and placed in descending value order, AFTER considering if
the fragment pair are on the same or separate rings.

b) Except for the sigma for a pyridine type nitrogen, each use of sigma
or rho causes it to 'age'. The first interaction at the top of the list
is entered at full potential because the current age of its sigma and rho
components is 'zero' for each. Each use reduces the effective sigma or
rho value to 1/2 its previous value, and so if each were at 'age 1', the
increment to the correction would only be 1/4 as much as a 'fresh' interaction.
The mechanics of this computation are best understood by looking at the
detailed output of some complex structures. Two such examples are provided
in the example section.*

3.2.3 Special Ortho

As noted in the previous section, aromatic substituent (fragment) pairs,
if they have sigma and rho values assigned to them, are given the same
correction factor regardless of their relative position on the ring. It
is important to keep in mind that if the fragment pair are on adjacent
positions (i.e. ortho), an additional correction may be required.

3.2.3.1 Crowding

'Crowding' of certain fragment types can effectively lower their aromatic-attached
values. This is most apparent in the case of fragments attached to the
aromatic ring through a hetero atom which possesses an electron pair, such
as -NHCOCH3.* A reasonable explanation of this observation is that the
lone pair can no longer remain in the plane of the ring, making the fragment
attachment resemble aliphatic (A) rather than true aromatic (a). The magnitude
of the correction appears to depend on both steric and electronic (field)
effects(25).

If this explanation is valid, one would expect the correction to vary
continuously up to a maximum characteristic of each fragment type. It was
surprising, therefore, to find that the rather large data set used in the
original evaluation of the 'negative ortho' effect (24) seemed to fit multiples
of Rekker's Magic Constant(20). This is handled in CLOGP by assigning
integers to a matrix which has generalized fragment types for coordinates.

More recent data provides many examples which do not support this 'quantized'
correction. Nevertheless, it is being retained for the present because
of simplicity and because its maximum is only 0.14.

3.2.3.2 Intra-Molecular Hydrogen Bonding

Hydrogen bonding is known to occur intramolecularly between two ortho subsituents
if one is a donor and the other an acceptor. A classical example of such
an H-bond is that in o-nitrophenol. As might be expected, an intramolecular
H-bond reduces water's ability to accommodate that solute, and the log
P of o-nitrophenol is over two log units higher than the m- and p-isomers
in the heptane and carbon tetrachloride solvent systems. One must always
keep in mind, however, that the octanol phase possesses both H-donor and
H-acceptor capability, not only because it is an alcohol, but because of
the 2M water present at saturation. In actuality, the presence of the intramolecular
H-bond in o-nitrophenol penalizes solvation in octanol slightly more than
it does solvation in water, and its log P is 0.09 log units lower than
the m- and p-isomers.

In terms of intramolecular H-bonding between aromatic ortho substituents,
the octanol/water system appears to be sensitive to a very restricted class.
The only clear-cut cases seem to result from a carbonyl group directly
attached to the ring acting as acceptor, and a directly-attached -OH or
-NH- acting as donor.* In all of the cases observed so far, the correction
is very close to +0.63, and is stored in the same matrix used for the 'negative
ortho' corrections. Thus the 'crowding' and H-bonding ortho effects never
are applied simultaneously, but a sigma/rho correction cannot be added
to either.

4. Summary

The fragment method of calculating log P(ow) has been proved valuable in
many fields, including drug design and hazard assessment. However, manual
calculations require a great deal of instructions and become very lengthy
for complex structures and thus are error-prone. The computer program,
CLOGP, enables the method to be applied by non-experts and includes an
estimate of error, which is not possible in the future. Regular users can
avail themselves of an annual update which will bring them current with
all newly measured fragment values and improved correction factors. Versions
with the Unified Driver compare the calculation from structure with a measured
value from log P(ow) for neutral solutes. Starlist is also included in
the annual updating service. We plan to make available in the near future
a searching program, GENIE, which will extend the search of Starlist to
close analogs.

The current literature contains many examples of QSAR accompanied by
calculations of hydrophobicity which have not been made according to a
consistent application of the rules they purport to follow. This has caused
some confusion and cast doubt upon the entire approach. Perhaps, if the
use of CLOGP becomes more widespread, published calculations will become
more comparable, especially if reference is made to the program version.

Any prediction of the future is risky, but, judging from the recent
past, we can expect an increasing demand for logP(ow) values. It is inconceivable
that CLOGP will be perfected to such an extent that it supplants partition
coefficient measurement. The two methods should remain as they are now:
mutually complimentary.

With the STARLIST module in CLOGP, the user will be able to check the
calculated value against an acceptable measured value if the solute structure
entered is one of over 4,000 contained in that special file which is limited
to non-tautomeric structures measured at a pH where the neutral form predominates.

6. Example CLOGP Calculations

The calculations shown in the following section illustrate how CLOGP treats
the basic fragment types and correction factors discussed in the reference
section. No attempt was made to illustrate every fragment in the database,
of course. Additional output was generated to illustrate the unusually
complex sigma/rho electronic correction factor for atrazine and adenine.

The examples are arranged according to the section number of the CLOGP
Reference Manual text which deals with the main feature illustrated. Since
some structures illustrate more than one feature, some of these "secondary
features" are also indicated by section number in the output.

Please note that there is an upper limit of 255 characters on the
input SMILES string.

6.1 Interpretation of Output

6.1.1 Maps

"Verbose" CLOGP output includes a section which shows the input SMILES
and assignments of fragments, rings, hydrogen counts, and isolating carbon
type. These "maps" are not shown in the example section, but are discussed
here. The first line in the "Map Box" gives the SMILES as entered by the
user. Fragment ordering, therefore, is NOT unique, and so to completely
understand the tabular results, one should become familiar with the map.
The second line in the map indicates Isolating Carbon type under their
respective SMILES notations. The third line numbers the polar fragments
(i.e., all those NOT I.C. or hydrogen) in the order entered. The fourth
line shows the locations of hydrogen atoms, including those contained in
fragments. The last lines indicate the location of atoms in rings. Even
in simple structures these maps can be of help in interpreting the calculations.
For example, in the four calculations in section 2.3.1, the value of the
-Br fragment varies from 0.2000 to 1.090. The reason is apparent from the
maps which show the variation in type of I.C. attachment.

6.1.2 Picture

Each example is accompanied by a picture of the chemical structure as generated
by the DEPICT algorithm. DEPICT indicates aromaticity by drawing circles
inside aromatic rings and suppressing all aromatic carbons symbols. More
information about how aromaticity is defined and how the picture is generated
may be found in the SMILES and DEPICT sections of the Daylight Software
manual.

6.1.3 Tabular Results

Most of the nomenclature in the "calculation details" is understandable
after reading the main body of the CLOGP Reference Manual, but a few terms
could use further explanation:
Under Class one may find "SCREEN". This is further documented under the
Description section of the CLOGP Reference Manual. For instance, it may
note "possibly anomalous steroid".
Under Type one may find the ring number for an Ortho interaction, for example,
because there may be more than one ring on which such a correction might
potentially be applicable.
Under Description one finds (ZW-) or (ZW+) after the name of each fragment
which can participate in zwitterion formation. However, only when strong
enough pairs are present is the correction actually entered (near the bottom
of the table). A sulfonic acid is strong enough even if both it and the
amine are aromatic; for a carboxylic acid, both it and the amine must be
aliphatic.
Under "Comment" column the "CLOGP=#" is the version number of
the Biobyte algorithm that is being used in the calculation, e.g. CLOGP=3.05.
The calculated CLOGP appears in the "Value" column.

6.2 Examples of Anomalies

The final set of calculations illustrate some of the present shortcomings
of CLOGP. For the first five of these, some rational explanation can be
given for the discrepancy, and even an estimation of the amount.
Adenosine, like 84 other measured purine nucleosides, is underpredicted by CLOGP. There is good evidence that this is due to a long-range intramolecular hydrogen bond between the 5'OH and N3 of the purine. A paper documenting this effect has been submitted for publication, and it is planned to include this correcting in upcoming versions of CLOGP.
Cortisone acetate is now well predicted with the special steroid correction factors.
Chain overlap is a very difficult situation to identify. The earlier attempt to account for it as a 'Fragment Branch' correction has been abandoned.