Those of you who watch this blog know that many of the discussions are about chemical structures, accurate representations on databases and how to “correctly” communicate chemical structures/compounds for the users. So, this is an OPINION question…it’s not an “I have an answer” blog post.

It can be envisaged as having a trans-orientation but the name on DailyMed doesn’t indicate trans….”3-(5-methoxy-1H-indol-3-ylmethylene)-N-pentylcarbazimidamide”

On Wikipedia here we see the structure below and a systematic name supporting a trans-orientation.

Now there are actually a number of ways to represent Tegaserod and, since there’s no stereochemistry to complicate the molecule, and we are interested in the skeleton per se, we can search on the first part of the InChI on a database like ChemSpider. A search on IKBKZGMPCYNSLU as the first part of the InChI for the structure gives 3 hits. Take a look.I don’t see any real reasons to show the crossbonds for the NH but so be it.

Now, consider that the three hits are E-, Z- and crossbond orientations, and their InChIKeys are as shown below, the results set is indeed expected. My question, based on the structures that you see for Tegaserod, would you prefer to see the compound drawn and how would you expect it to be held in the database. Think about what you would expect to happen in terms of a search. If you drew a cis-form should it retrieve cis and crossed? If you drew crossed should it retrieve cis and trans? etc. Remember, it’s an opinion so no answer is wrong…

Share this:

Like this:

Tonight I was amazed for the first time in a long time. What amazed me was how fast that the post I made to Twitter got indexed. I said:

What I meant is that the structure image on Wikipedia has no stereochemistry. See here.

About 2 minutes later I did a search on Google to see whether I could find Goserelin and compare the stereochemistry for what I believe is the structure (from ChemSpider here). What I found was a short list of hits but also this:

This was literally within a couple of minutes of me posting the Tweet.

Ok..we live in an amazing world. Our networks are so-interlinked at this point that the scope of what we are achieving, and will achieve as the semantic web comes to life is, simply put, amazing. This observation impressed me. Maybe it shouldn’t but it did….is there something obvious that is going on here that I am missing? Should I not be so impressed?

Like this:

Over on the Academic Productivity blog Dario has discussed “Why do scientists (not) contribute to Wikipedia?”. This has pointed to a survey that is one that any user of Wikipedia, especially a scientist, should fill in.

“A survey has been launched by the Wikimedia Research Committee to understand why scientists, academics and other experts do (or do not) contribute to Wikipedia, and whether individual motivation aligns with shared perceptions of Wikipedia within different communities of experts. The survey is anonymous and takes about 20 min to complete. Whether you are an active Wikipedia contributor or not, you can take the survey and help Wikipedia think of ways around barriers to expert participation.”

Like this:

I am pleased to announce that I am participating as a presenter at the Tenth Annual Bio-IT World Conference & Expo 2011 taking place April 12-14, 2011 in Boston, MA. I will be a part of the “(W1) Current Methods for Computational Toxicology and Chemogenomics” pre-conference workshop and discussions and I think you would enjoy this workshop and set of topics. It’s a great gathering every year and I encourage you to attend.

Like this:

Recently I spat in a tube and sent it off to 23andme for my genetic testing. I am still digesting the results…no pun intended since it does suggest that genetically I have a higher probability than normal for ulcerative colitis! I LOVE the report I got from 23andme but I am reading it through slowly and educating myself. I now know my “paternal haplogroup” and I swear, I did NOT know before.

I am happy to see the decreased risk for prostate cancer of 2X. It doesn’t make the annual “What doctor, no flowers?” wince any less painful however. But it is possible that I will gain weight, lose my hair and gain some liver spots….all of which I have happily emulated with the iPhone apps known as FatBooth, BaldBooth and OldBooth. I’d divorce me now…I am not going to age well…and please, for my friends…don’t say that I don’t look any different!

Share this:

Like this:

Yes, that’s quite a title for a blog post. But it covers the nature of exchanges that Egon Willighagen and I have been having recently (that among others as we are co-authoring a book chapter on Computational Toxicology also). Egon asked a question on the Blue Obelisk Discussion group about tautomers and I answered it on this blog. Egon has posted a follow up blog post here. His most recent post makes a series of valid comments…all good and well worth discussing.

“Anyway. Tautomerism was a curation issue in the first(!!!) entry I was curating. The sixth had the more well-known problem, I think. I may be blind, but I would say this drug has a stereocenter:

But none of the databases I checked so far (including ChemSpider) defines the stereochemistry! I thought we settled that some decades ago? Stereochemistry of drugs matter. What is going on here?”.

The drug shown is Aminoglutethimide. It’s on Wikipedia here without specific stereochemistry. But as we know Wikipedia does have errors (see slide 47/126 here). So what gives? It’s on KEGG in the same way. Also on ChEMBL here. But it IS on ChemSpider as both stereoforms….R and S. I would suggest that the drug is likely a racemic mixture of the stereoforms and as represented on all of the databases it’s probably okay to not draw the stereobonds as there is only one stereocenter to worry about. Checking Dailymed supports this in this record. A search on “Aminoglutethimide” gives 36000 hits on Google…I did not wade through them! I think the drug is therefore supplied as a racemate, can be separated (see top google hits) but is okay on ChemSpider as is.

Like this:

Today. Today is the day I noticed that CAS Scifinder now supports InChI! Wow. Now, that may not be big news for many of you but for those of us who have supported InChI, both vocally AND in action around it, this is big news. InChI is not perfect and has areas to develop in (some later posts will cover this) but it is ALREADY extremely enabling. (For an example about InChI issues…but focused primarily on how people DRAW structures that they feed to InChI algorithms see slides 75-84 on this presentation http://tinyurl.com/4hhgqbd)

It is helping to link databases, enable web searching and improve communication between cheminformatics applications. ACS and CAS have been quite late in providing support for InChI and today I noticed it has arrived. This is great news for InChI and very much a blessing for InChI as a standard for interchange. Now, if we can get StdInChIKeys layered onto all ACS publications then the need for an InChI Resolver will be increased….

Share this:

Like this:

Assume that you were hosting a public domain database (say ChemSpider). Assume that you had to represent racemates in the database. Assume that with an abundant community willing to provide input you get a lot of feedback about how that should be done. Assume that you have the benefit of hosting a blog and can get more input….thus this question.

Which of the three representations below would you use to represent a racemate?

Share this:

Like this:

I recently posted about the project that will become known as NMRCAVES, NMR Computer-Assisted Verification and Elucidation Systems. This will be a workshop to be held at SMASH. There will be no workshop without two essential ingredients: participants and data.

The participants will need to be willing participants to work with us with their software, algorithms and approaches to test their systems on data. The data will be data supplied by the community and provided to the participants in a blind study to test their systems.

To populate the workshop is the first challenge. if we cannot get enough participants then even though we might get an abundance of data there will be no workshop to hold if we cannot engage the groups to work with it. There are a limited number of groups/individuals working in the areas of computer-assisted structure verification and elucidation by NMR. I have listed them below. No offense meant if I have accidentally missed anyone out. Also, they are listed in alphabetical order so no favoritism either…

Can anyone point me to groups or software solutions that I am missing and other potential solutions out in the community that I should approach? I will be approaching the listed groups with an invite to participate in NMRCAVES and then will be asking the community if you are willing to provide data for the project!

I am honored to have been invited to lead a workshop at the SMASH NMR conference later this year. I will be co-hosting with Michael Bernstein, someone who I have known for many years and with whom I have spent many hours (if not days!) discussing the ins and outs of NMR prediction and structure verification by NMR,

The workshop will provide an environment for developers of software packages and associated algorithms allowing for structure verification and elucidation to engage with interested members of the NMR community attending the SMASH NMR meeting. Presenters may include both commercial and non-commercial software packages and the workshop will allow the participants to report on their respective approaches as well as report on the performance of their algorithms against a large set of data provided by the community.

The one day workshop will be separated into Structure Verification and Structure Elucidation segments with participants who have chosen to participate in the project. We are hoping for participants from both the academic and commercial sectors.

I’ve called the workshop NMRCAVES: NMR Computer Assisted Verification and Elucidation Systems. Below is an outline to initiate a conversation with interested parties. It is a suggested outline for the project and I welcome feedback.

The data analysis components of the workshop are outlined below.CASV: Four sets of data will be made available to the participants.
(1) HNMR only, minimum of 25 spectra and 25 suggested structures (random distribution of correct/incorrect with at least 50% correct)
(2) HNMR and 2D HSQC, minimum of 25 spectra and 25 suggested structures (random distribution of correct/incorrect with at least 50% correct)
(3) HNMR only, minimum of 25 spectra and 25 sets of 3 structures (1 of each of the 3 is the always the correct structure)
(4) HNMR and 2D HSQC (preferably multiplicity edited-HSQC) minimum of 25 sets of spectra and 25 sets of 3 structures (1 of each of the 3 is the always the correct structure)

The participants will receive the data via download from an FTP site with each folder numbered in an ambiguous manner. All structures will be known to only two parties: the laboratories acquiring the data and the host of the workshop (AJW). The participants will have the responsibility to provide a report identifying the correct/incorrect structures in test sets (1) and (2) and identifying the correct structure out of the combination of 3 provided in (3) and (4). When all reports have been submitted each participant will receive a report identifying the correct structures for their review and in order for them to report on their successes and to further review and report on the data during the workshop.
The overall performance statistics comparing the results of the various participants will be reviewed and presented at the workshop by the workshop host.

CASE: The objective should be to test the ability of algorithms to correctly elucidate the skeletons of unknowns with the provision of “high-quality datasets” where sensitivity is deemed not to be a limitation. While it is acknowledged that sensitivity is an issue in CASE approaches this particular hurdle should be removed from the challenging of the algorithms. Request data from a series of laboratories. The minimum dataset should include “High-resolution MS”, 1H, COSY, HSQC/HMBC. Additional data can include TOCSY, DEPT-HSQC, HSQC-TOCSY, 1H-N15 direct and long-range correlation, NOESY/ROESY.
The participants will receive the data via download from an FTP site with each folder numbered in an ambiguous manner. All structures will be known to only two parties: the laboratories acquiring the data and the host of the workshop (AJW). All elucidations will be done blind and the participants will have the responsibility to provide a report including a table of the top 3 structures for each dataset, rank-ordered if possible, from most-likely to least-likely. When all reports have been submitted each participant will receive a report containing the correct structures for their review and in order for them to report on their successes and to further review and report on the data during the workshop.
The overall performance statistics comparing the results of the various participants will be reviewed and presented at the workshop by the workshop host.

Outcome of Project
1) A review of the state of contemporary computer-based structure verification and elucidation
2) All data to be publicly shared and made available as Open Data for download and to become a gold standard reference set of data for the community to utilize for further testing and development
3) All processed spectra to be uploaded and available on a public domain database (e.g. ChemSpider) and associated with the correct chemical structure
4) A minimum of one co-authored publication reviewing the results of the workshop and associated studies

Your feedback, comments and questions are welcomed. We are especially looking for laboratories who are willing to provide sets of data for analysis during the project as well as software groups who develop algorithms for structure verification and elucidation and who wish to participate in the project.