Posts Tagged ChemConnector

I think the Google Scholar Citations resource is excellent. I was one of the fortunate ones that managed to get onto the system early and I signed on immediately and used it to aggregate my papers very easily and quickly as represented here. One of my favorite aspects of the system is how it keeps me informed, by emails direct to my inbox, that other papers are referencing papers for which I am an author. Today the email that hit me listed four such papers. A free service, regular updates and, as best as I can tell, working as advertised for me at least.

Abstract Soil organic matter (OM) contains vast stores of carbon, and directly supports
microbial, plant, and animal life by retaining essential nutrients and water in the soil. Soil OM
plays important roles in biological, chemical, and physical processes within the soil, and …

Abstract Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of …

Chemists working in biomolecular application projects are usually looking at many related
molecules (eg results of a virtual screening run, lead series development or library design).
For a convenient visual analysis of this data it is essential that differences between …

Despite my affection for Wikipedia this week I am annoyed about what’s going on for me on Wikipedia. I’ve read The Wikipedia Revolution and understand the editorial activities and I’ve had many discussions about how authors of Wikipedia articles have been “beaten up” in a friendly way. I’ve been warned about Conflict of Interest policies and yet, because I think it’s important, have tried to navigate the complexities of contributing articles. At present however my contributions on Wikipedia regarding scientists and projects I know about have all been flagged, either for deletion or for “notability”.

Gary Martin and Sean Ekins are personal friends so YES, I have close connections with the subject. And I believe I can objectively write a good article about them. Just like I wrote about the village I grew up in…Afonwen. I only spent 12 years of my life there….so have a close connection with that too. I have known Gerhard Ecker for about three years, and know about his work from reading his articles and hearing him speak, and feel its valid to contribute an article as I JUDGE he’s a notable scientist. Gary Martin has almost 300 publications, and an h-index of 27. In the domain of NMR anyone who is doing small molecule structure elucidation is almost certainly using technology he has contributed too. He is notable. Sean Ekins is also notable, in my opinion. And surely Wikipedia is about collective opinions.

I have tried to follow notability guidelines for academics but have clearly failed so encourage anyone reading this post to help clean up the articles. If any of you out there happen to know Gerhard, Gary or Sean DON’T contribute though…you might get flagged as being a contributor who has a close connection. It’s much better to write about people you don’t know. Clearly I understand the possible bias …

If I look at the number of chemists on Wikipedia I find the following list of about 480 chemists. That article is a list of world-famous chemists. There is also a smaller list of Russian Chemists. The end of the list looks like this:

These are likely all NOTABLE chemists as I couldn’t find a single article in the list with a challenge on it…but I confess to not looking at each one one at a time. But that’s what we have for chemists….a list of world-famous chemists, biochemists and Russian chemists.

Many of us have heard about how “open” Wikipedia is including many of the exchanges regarding pornography on Wikipedia. In many cases I have to simply caution “welcome to the internet”. We all know its out there…how could we not. There is material on Wikipedia that is shocking, but at the same time educational. But where I take issue, just for comparison purposes, is that top-notch scientists, in my opinion (and I judge that of many others) can be flagged as not notable, yet pages like those listed below for pornstars can exist without question, without flagging but, I have to assume, are both encyclopedic and notable.

Similar to the list of chemists a search on pornstars gives a full article here but then these incredibly long lists!

The last one is quite a list! I guess its appropriate to list pornstars by decade but scientists tend to perform better over the longer term and can have 40-50 year careers whereas I don’t even want to imagine that for the other career! I struggle to see why the list of references for Ron Jeremy is any more notable/appropriate than the list of references for Gary Martin.

What’s ridiculous is that there is even an article about pornstar pets. What??? This has more of a place on Wikipedia than some of our worlds most published scientists? Is there something wrong with this picture?

While I may not fully understand what is deemed to be appropriate in terms of notability for a scientist, and I do understand the judgment that I might be too close to the scientists to be objective (but I challenge that!) I definitely challenge the status that ponstars deserve more exposure, pardon the pun, than the worlds chemists.

Despite my rants I understand the challenges that will likely show up as comments on this blogpost. I understand that I will be pointed to WP:COI and WP:Notability. I do not get to set the rules, I need to follow them as I am a small part of a very important community of crowdsourced improvement. But, overall, I remain surprised at how there appears to be so much diligence looking at the articles of scientists rather than those of pornstars. I think scientists are generally involved in very notable activities that generally distinguish them from the bulk of the population. I think pornstars are involved in activities that are not particularly notable as the bulk of the population will do them at some point in their life….well, not ALL activities that pornstars do I’m sure…..

I believe we need a change in policy. I believe that scientists deserve more notability than pornstars and that diligence, while appropriate, should be used in a more tempered manner.

While the internet has been revolutionizing our access to data and information via our computers, computers have been miniaturizing to the point where a smart phone offers capabilities that many desktops could not deliver less than a decade ago. Mobile browser technology and app-based delivery for software has now delivered into our hands further access to data via phones, pads and tablets. Whether it be in the form of chemical calculators, accessing publishers websites or public domain databases containing millions of chemical structures, mobile chemistry is here and is expanding in capability and coverage at a dramatic rate. This presentation will review the status of mobile devices and how they are being used to enable chemists.

I am presently in Barcelona at the ICIC meeting to give a presentation entitled “Mobile Chemistry and “Generation App”. I have been preparing by looking at what is new in the world of Chemistry Apps and in the process have updated my ongoing list of apps and updated it on SlideShare. I intend to keep updating it every couple of months to keep track of new apps as they become available. I have not had time to update the SciMobileApps wiki as yet.

The internet is a rich source of chemistry related data and, nowadays, if a chemist knows how to initiate a search, data can be sourced for millions of chemicals online. The nature of online data varies from simple molecule diagrams, to experimental and predicted properties, encyclopedic articles, synthetic routes, analytical data, patents and publications. The array of information now accessible is distributed across thousands of sites giving rise to the information overload commonly associated with the Google-type searches on the internet. In addition the purest language of chemistry, that of chemical structures, is not fully supported on the web as yet. This presentation will provide an overview of how the internet is being meshed together using data aggregation and standardization approaches to enable a structure-searchable internet for chemistry. The speaker will present an overview of the ChemSpider platform (http://www.chemspider.com), the challenges of linking together over 400 internet resources and 26 million unique chemicals, and discuss how members of the chemistry community can directly contribute to enhancing the availability of quality data online.

This is a movie of the talk I gave using the BigBlueButton platform to students and faculty at the University of Arkansas, Little Rock.

My final presentation at ACS Denver yesterday I think was the clearest presentation I gave all week. As with most presentations I gave last week I was up at 4am to finish it off based on conversations I had been having during the week. A lot of people came to the booth after the presentation to acknowledge that they had been dealing with such challenges for years and that it was time that a drug collection was finally available. It took months to get 152 drugs “right”. It would take a looong time to reproduce something of the quality of Merck Index!

“Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs

Internet-based public domain databases containing chemical compounds have grown in number, capability and content in recent years. There are now many databases containing millions of chemical compounds associated with different types of data including chemical names, properties, analytical data, and with associated mapping to proteins, assay data, clinical information and so on. These disparate data sources suffer from one common issue – quality of data. This presentation will provide an overview of our efforts to source the appropriate structural representations for 200 top-selling drugs from public domain sources. This intra- and inter-laboratory comparison of approaches, processes and necessary agreements exposed the challenges associated with aggregating structure-based data. The project also provided data regarding the distribution of quality issues associated with many of the community’s popular databases.”

Yesterday I gave a talk at the 5th Meeting on U.S. Government Chemical Databases and Open Chemistry hosted by Mark Nicklaus. It was a great meeting. A lot of like minded people and some great work going on to provide access to chemical databases. I’ll blog more when I get back from the ACS Denver meeting this coming week. For now I am simply putting up a copy of the talk I gave.

“ChemSpider is a structure centric database hosted by the Royal Society of Chemistry and integrating over 25 million chemical compounds to over 400 internet-based resources including many public domain databases, Wikipedia, chemical vendors, patents, publications and other web-based services. The intention is for ChemSpider to become one of the primary online hubs for chemists to source chemistry related data. During the development of the ChemSpider database we have utilized numerous approaches to standardizing, curating and validating the data supplied to us for hosting and integration. This presentation will provide an overview of our initial development of the ChemSpider database and provide an overview of our present processes and procedures for handling incoming data depositions. We will also discuss how crowdsourcing can help to expand, curate and validate the data on the ChemSpider database.”

I have been blogging on Google Scholar Citations in recent days and noticing some interesting details (1,2,3). I have been in exchanges with the Microsoft Academic Search support team on Twitter trying to collapse multiple accounts. They are helping.

I have since continued my comparison to look for differences in the two platforms. There are some very obvious differences. One GLARING example…on Google Scholar my top cited paper has 50 citations. On Microsoft Academic Search it has 3. BIG difference!

Over the past few weeks the ChemSpider team has been working hard with James Little from the Eastman Chemical Company. We have been adding new capabilities to support Mass Spectrometry searches. I will detail these capabilities in a later blog post but for now I am pointing to the POSTER that Jim presented at ASMS. It was a real pleasure working with Jim. I met him many years ago when I worked at Eastman Kodak company and before Kodak divested Eastman Chemical (among many other things). Jim gave us great feedback, was exacting in his testing and a gracious collaborator even as we let deadlines slip because of many other distractions.

In very many cases, an unknown to an investigator is actually known in the chemical literature. We refer to these types of compounds as “known unknowns.” ChemSpider is a particular good collection of “known unknowns” for the identification of compounds in commercial products, environmental matrices, etc. However, several modifications were necessary to refine the initial search results sorting with orthogonal filters such as the number of associated patents and references. Previously we described a similar approach using the CAS registry with either SciFinder or STN Express, but ChemSpider is a viable alternative and it is freely accessible to the public.Accurate mass GC-MS and LC-MS measurements were performed on mixtures using, respectively, either Waters GCT or LCT (LockSpray) instrumentation. MassLynx (Waters) elemental software was used to determine molecular formulae which were further refined by i-FIT for ranking to theoretical isotope distributions. Candidate structures were obtained by searching either molecular formulae or monoisotopic molecular weights with ChemSpider. Further data such as EI or MS/MS fragmentation, number of exchangeable protons, or sample history were used to identify the “known unknown.” The ChemSpider database of >25 million chemicals was searched by either molecular formulae or monoisotopic molecular weights to identify “known unknowns.” The latter is an attractive approach since no subjective restrictions on the elements, the range of elements, and the double bond equivalents are required prior the ChemSpider search to limit candidate compounds. Changes were made in the ChemSpider to refine the initial candidate list by number of associated references or patents. This tended to bring more promising candidates to the top of the list. The success of these approaches was evaluated with a group of 90 compounds from literature sources, internet sites, and American Society for Mass Spectrometry Conference presentations. Furthermore, the results were compared to similar methods employed searching the Chemical Abstracts Services databases.

Like this:

Recently I moved this blog to WordPress hosting and started using a new Theme. This is work in progress. Many of the original image associations still need to be remade as the blog went from www.chemconnector.com/chemunicating to simply www.chemconnector.com. With the new theme I decided to start managing my CV, presentations and publications online too. I’ve had it staggered across various sites such as Mendeley but having it managed on my own blog just made more sense. In particular, what I have been doing is spending half an hour per night creating links between the papers on the My Curriculum Vitae page using the DOI and associated CrossRef Resolver to do the linking. It makes sense to go this path.

In order to do the linking I first have to find the DOI. To do the DOI I search the paper title on google, or the reference where necessary. It’s had some interesting results already as I detailed here. While linking up the papers…75 done and about 30 to go…I observed an increasingly obvious trend. It was an unexpected trend based on what I had been told. The trend? PubMed is not just about the Medicine and the Life Sciences.

“Long-range homonuclear coupling pathways can be observed in COSY or GCOSY spectra by the acquisition of spectra with larger numbers of increments of the evolution period, t1, than would normally be used. Alternatively, covariance processing of COSY-type spectra acquired with modest numbers of t1 increments, allows the observation of multistage correlations. In this work results obtained from covariance-processed GCOSY spectra are fully analyzed and compared to normally processed COSY and 80 ms TOCSY spectra. ”

I’m sure you’d agree it’s NOT very “medical”, “biomedical” or “life sciences”. Yet…if I do a Google search we find:

As can be seen, PubMed returns the reference above Wiley, the publisher of the article. I saw this for many, many of the publications listed on my CV. Most of them are based on NMR spectroscopy data processing approaches so why would they be in Pubmed? I am assuming this is simply because the journal itself has been identified as a journal that is “acceptable” to Pubmed? Now, I’m a chemist…and it would be super if there was a Pubmed for the whole of chemistry…of course we cannot call it PubChem…that’s already taken. But I wonder what is standing in the way of PubMed simply becoming all-encompassing…why can’t it accept all chemistry papers, for example. It’s clearly accepting some (many!) that I have authored/co-authored. Why not more? Is it policy? Is it resources? Can anyone comment?

Share this:

Like this:

about.me

Helping to Create Connections in Chemistry

My passion is connecting people to chemistry and I am known as the ChemConnector in the social network. I have almost a decade of experience of analytical laboratory leadership and management. I am a prolific author with over a hundred and fifty scientific publications, book chapters and books, and hundreds of public presentations. I am one of the original founders of the ChemSpider database and am now the VP Strategic Development for the Royal Society of Chemistry