Re: [Cdk-devel] How many atoms of quinine belong to two rings?

Yep,
Although traditionally not the case SSSR usually means Minimum Cycle Basis and one would say there are Minimum Cycle Bases for that structure. This is where the essential/relevant comes in. If we name each of the bases MCB1, MCB2 and MCB3 then we can define.
essential cycles = MCB1 ∩ MCB2 ∩ MCB3
relevant cycles = MCB1 ∪ MCB2 ∪ MCB3
The current SMARTS matcher uses the essential cycles (intersection of all the bases) and as no simple cycle appears an all three bases these atoms belong to no cycles (rings). The trouble with relevant cycles is they can be in exponential in number (cyclophane). However in practice they can be handily easily and in the case of smarts all you need for each atom is, how many rings does it belong to, and the size of each ring? These can easily be set in polynomial time for the relevant cycles.
J
On 3 Jun 2013, at 10:35, Egon Willighagen <egon.willighagen@...> wrote:
> On Mon, Jun 3, 2013 at 11:22 AM, John May <john.wilkinsonmay@...> wrote:
>> I should add that the naming bicyclo suggests two rings, and indeed you can cut two bonds and have a non cyclic molecule. However, there are three possible choices of two rings and thus introduce uncertainty when matching.
>
> Yeah, and this is the reasons why SSSRs are undefined... for this cage
> structure, there are 3(?) possible SSSRs, not?
>
> Egon
>
>
> --
> Dr E.L. Willighagen
> Postdoctoral Researcher
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Cdk-devel mailing list
> Cdk-devel@...
> https://lists.sourceforge.net/lists/listinfo/cdk-devel

Thread view

Hi all,
Just patched a related bug and spotted this curiosity and would do with some opinions. The test specifies that when using the SMARTS pattern [R2] one should find 6 matches in the molecule quinine. This is one of the examples from the tutorial and Daylight's DepictMatch can highlight these for you. However, the answer of course depends on how you define a ring.
Now, before you read on, how many would you count?
In Daylight's case it looks as though they are using the non-unique SSSR. As the set is non-unique there are certain patterns match only some of the time and will usually depend on atom order (see example below). Currently the CDK uses the essential rings (unique subset of SSSR) so this random behaviour is not seen but in this case it would say there are only 2 atoms (on the naphthalene) belonging to two rings. In my mind the correct answer is 8 which we could easily reach using a different unique ring set.
However, if the Daylight implementation uses the SSSR, perhaps it is what the CDK should use (as it did previously). It would not be intuitive in some cases but it would match the official SMARTS usage. Thoughts?
For the random matches, try these on Depict Match:
C1NC(CC2)CCC12 order1
C1CC2CCC1NC2 order2
pattern:
N([R2])[R2]
quinine
C123C5C(O)C=CC2C(N(C)CC1)Cc(ccc4O)c3c4O5

I should add that the naming bicyclo suggests two rings, and indeed you can cut two bonds and have a non cyclic molecule. However, there are three possible choices of two rings and thus introduce uncertainty when matching.
J
On 3 Jun 2013, at 10:12, John May <john.wilkinsonmay@...> wrote:
> The bridgehead atoms belong to three rings.
>
> J
>
> On 3 Jun 2013, at 06:05, Egon Willighagen <egon.willighagen@...> wrote:
>
>> On Sun, Jun 2, 2013 at 7:18 PM, John May <johnmay@...> wrote:
>>> ... it would say there are only 2 atoms (on the naphthalene) belonging to two rings. In my mind the correct
>>> answer is 8 which we could easily reach using a different unique ring set.
>>
>> Why not 10? 2 from the naphtalene and 8 from the other ring system?
>> all atoms in that cage-like structure participate in two rings, not?
>>
>> Egon
>>
>>
>> --
>> Dr E.L. Willighagen
>> Postdoctoral Researcher
>> Department of Bioinformatics - BiGCaT
>> Maastricht University (http://www.bigcat.unimaas.nl/)
>> Homepage: http://egonw.github.com/
>> LinkedIn: http://se.linkedin.com/in/egonw
>> Blog: http://chem-bla-ics.blogspot.com/
>> PubList: http://www.citeulike.org/user/egonw/tag/papers
>>
>> ------------------------------------------------------------------------------
>> Get 100% visibility into Java/.NET code with AppDynamics Lite
>> It's a free troubleshooting tool designed for production
>> Get down to code-level detail for bottlenecks, with <2% overhead.
>> Download for free and get started troubleshooting in minutes.
>> http://p.sf.net/sfu/appdyn_d2d_ap2
>> _______________________________________________
>> Cdk-devel mailing list
>> Cdk-devel@...
>> https://lists.sourceforge.net/lists/listinfo/cdk-devel
>

Yep,
Although traditionally not the case SSSR usually means Minimum Cycle Basis and one would say there are Minimum Cycle Bases for that structure. This is where the essential/relevant comes in. If we name each of the bases MCB1, MCB2 and MCB3 then we can define.
essential cycles = MCB1 ∩ MCB2 ∩ MCB3
relevant cycles = MCB1 ∪ MCB2 ∪ MCB3
The current SMARTS matcher uses the essential cycles (intersection of all the bases) and as no simple cycle appears an all three bases these atoms belong to no cycles (rings). The trouble with relevant cycles is they can be in exponential in number (cyclophane). However in practice they can be handily easily and in the case of smarts all you need for each atom is, how many rings does it belong to, and the size of each ring? These can easily be set in polynomial time for the relevant cycles.
J
On 3 Jun 2013, at 10:35, Egon Willighagen <egon.willighagen@...> wrote:
> On Mon, Jun 3, 2013 at 11:22 AM, John May <john.wilkinsonmay@...> wrote:
>> I should add that the naming bicyclo suggests two rings, and indeed you can cut two bonds and have a non cyclic molecule. However, there are three possible choices of two rings and thus introduce uncertainty when matching.
>
> Yeah, and this is the reasons why SSSRs are undefined... for this cage
> structure, there are 3(?) possible SSSRs, not?
>
> Egon
>
>
> --
> Dr E.L. Willighagen
> Postdoctoral Researcher
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Cdk-devel mailing list
> Cdk-devel@...
> https://lists.sourceforge.net/lists/listinfo/cdk-devel