ChemConnector Bloghttp://www.chemconnector.com
Helping to Create Connections in ChemistryMon, 19 Feb 2018 20:28:49 +0000en-UShourly1https://wordpress.org/?v=4.9.422183619Bias and Data-Driven and Probability-Based Decision Making in Trivia Crackhttp://www.chemconnector.com/2018/02/19/bias-and-data-driven-and-probability-based-decision-making-in-trivia-crack/
http://www.chemconnector.com/2018/02/19/bias-and-data-driven-and-probability-based-decision-making-in-trivia-crack/#respondMon, 19 Feb 2018 20:28:47 +0000http://www.chemconnector.com/?p=3482I’ve been playing Trivia Crack for a few years now and, as of today, I am a level 326. I am an iPhone/iPad user so grabbed it from the AppStore. In playing I have filled my brain with some useless information, learned a lot of history, geography and sports, and taken advantage of my Science background to win more than a few challenges. In terms of Entertainment its clear I stopped learning about new music a few years ago. I’m stuck in my musical history with little interest in the new music scene really. I have enjoyed playing Trivia Crack against my girlfriend for over three years and we continue to have regular periods where we actively engage with the game.

Trivia Crack has a lot of downloads with Forbes reporting on over 300 million downloads and I can hear the theme tune while sitting in restaurants as people get their dose of the day.

There is a lot of advice online for people to try and beat the game. Much of this advice is tactics based. Hackers have even been taking their pokes at it. Looking at some of the analyses that have been made I am at least in the top 1% for Science, and with a category performance of 86 am better than 99.8% of the people playing Trivia Crack. My weakest category is Sports…not a surprise as I prefer to do sports rather than watch it or read about it. I am generally flat across the board for Entertainment, Art and Literature and Geography.

As a scientist I am data driven. However, as Louis Pasteur once commented, “In the fields of observation chance favors only the prepared mind” (https://en.wikiquote.org/wiki/Louis_Pasteur). So, while playing over 3500 games I started noticing patterns that helped me play the game. There were numerous patterns I noticed over the years but I will summarize them here and then share the data.

If I did not know the answer to a question and HAD to guess, my guesses generally worked out best when I always guessed that the first (top) answer was correct. If I guessed the fourth (bottom) answer I was generally, but not always wrong.

With the observation, that I could reproduce over and over, I decided to gather the data and analyze it statistically. The data is shown below and represents the number of times that the answer is in each position 1 to 4, top to bottom. Each column corresponds to the frequency of correct answers for a particular grouping. position of the correct answer out of the four possible answers. I gathered data over a number of days in five different groups. I chose to distribute the groupings into different sizes also, stopping the gathering of data when there were 25,50 or 100 answers in position 1.

The data speaks for itself (and is available for download on FigShare here). In all five groupings the majority of answers are in position 1, commonly the chances of the answers being in position 1 versus position 4 is about double. This means that if you are lost in terms of answering a question, and have no idea which answer to choose, you should select position 1. The results, over time, will be that you will be right more often than not. If you know that positions 2 and 3 are not the correct answers and are trying to choose between positions 1 and 4 then choose position 1. You will be correct 2x more often than if you chose position 4.

While I believe the data speaks for itself a statistical analysis is certainly in order. I’ve done a lot of stats over the years but I am fortunate enough to know people who are way more proficient than I am. So, I approached my friend John Wambaugh and asked for him to apply his most preferred approach to analyze data that I would provide. He wrote a little bit of code in R and produced the analysis below which he concluded as “So, if you don’t know the answer, always guess A.” I agree – it’s a useful strategy and worth trying out for your own Trivia Crack game. That said I would expect that they would have a random distribution of the correct answers in the game and maybe something they should address?

“If we assume that each time you answer a question one of the four answers must be right, then there are four probabilities describing the chance that each answer is right. These four probabilities must add up to 100 percent. The number of correct answers observed should follow what is called a Dirichlet distribution. The simplest thing would be for all the answers to be equally likely (25 percent) but we have Tony’s data from 6 groupings in which he got “A” 275 times, “B” 193 times, “C” 166 times, and “D” 134 times.

The probability density for Total given observed likelihood is 23200

While the probability density for Total assuming equal odds is 4.61e-08

But it is unlikely that even 768 questions gives us the exact odds. Instead, lets construct a hypothesis.

We observe that answer “A” was correct 35.8 percent of the time instead of 25 percent (even odds for all answers).

We can hypothesize that 35.8 percent is roughly the “right” number and that the other three answers are equally likely.

The probability density for Total assuming only “A” is more likely 101.

Our hypothesis that “A” is right 35.8 percent of the time is 2.19e+09 times more likely than “A” being right only 25 percent of the time.

Among the individual games, the hypothesis is not necessarily always more likely:

For Game.1 the hypothesis that “A” is right 35.8 percent of the time is 129 times more likely.

For Game.2 the hypothesis that “A” is right 35.8 percent of the time is 1910 times more likely.

For Game.3 the hypothesis that “A” is right 35.8 percent of the time is 5.25 times more likely.

For Game.4 the hypothesis that “A” is right 35.8 percent of the time is 0.754 times more likely.

This value being less than one indicates that even odds are more likely for Game.4 .

For Game.5 the hypothesis that “A” is right 35.8 percent of the time is 32 times more likely.

For Game.6 the hypothesis that “A” is right 35.8 percent of the time is 99.2 times more likely.

So, we might want to consider a range of possible probabilities for “A”.

Unsurprisingly, the density is maximized for probability of “A” being 36 percent.

However, we are 95 percent confident that the true value lies somewhere between 33 and 39 percent.

So, if you don’t know the answer, always guess “A”.

]]>http://www.chemconnector.com/2018/02/19/bias-and-data-driven-and-probability-based-decision-making-in-trivia-crack/feed/03482GUEST POST by Emma Schymanski: Suspect Screening with MetFrag and the CompTox Chemistry Dashboardhttp://www.chemconnector.com/2017/12/08/guest-post-by-emma-schymanski-suspect-screening-with-metfrag-and-the-comptox-chemistry-dashboard/
http://www.chemconnector.com/2017/12/08/guest-post-by-emma-schymanski-suspect-screening-with-metfrag-and-the-comptox-chemistry-dashboard/#respondFri, 08 Dec 2017 20:45:15 +0000http://www.chemconnector.com/?p=3472Identifying “known unknowns” via suspect and non-target screening of environmental samples with the in silico fragmenter MetFrag (http://msbi.ipb-halle.de/MetFragBeta/) typically relies on the large compound databases ChemSpider and PubChem (see e.g. Ruttkies et al 2016). The size of these databases (over 50 and 90 million structures, respectively), yield many false positive hits of structures that were never produced in sufficient amounts to be realistically found in the environment (e.g. McEachran et al 2016). One motivation behind the US EPA’s CompTox Chemistry Dashboard is to provide access to compounds of environmental relevance – currently approx. 760,000 chemicals. While the web services are not yet available to incorporate the Dashboard in MetFrag as a database like ChemSpider and PubChem, there are a number of features in MetFragBeta that enables users to use the CompTox Chemistry Dashboard to perform “known unknown” identification with MetFrag. This post highlights the Suspect Screening Functionality.

First we have our (charged) mass. Take m/z = 256.0153. This was measured in positive mode and we assume (correctly) that it’s [M+H]+. Make sure you set this correctly in MetFrag.

Then retrieve your candidates, e.g. using ChemSpider or PubChem and a 5 ppm error margin:

You could now process the candidates … but we have not done anything with the Dashboard! This is hidden in the middle in the “Candidate Filter & Score Settings” tab:

You can use the Candidate Filter to process ONLY candidates that are in the CompTox Chemistry Dashboard, excluding all other candidates, by clicking on “Suspect Inclusion Lists” and selecting the “DSSTox” box (see screenshot), which retains (currently) 11 of the 156 ChemSpider candidates:

Once finished the processing, the plot in the “Statistics” tab should look something like this – depending on what additional scores you selected:

It is also possible to use one (or more!) suspect lists to SCORE the different candidates without excluding any matches from ChemSpider or PubChem, by selecting the same box under the “MetFrag Scoring Terms” part instead (see screenshot). Additional lists like the Swiss Pharma list shown below can be downloaded from the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) and also viewed under the lists tab in the CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard/chemical_lists). MetFrag only needs a text file containing InChIKeys of the substances for the upload – which can be obtained from the Dashboard or Suspect Exchange downloads.

Using the Suspect Lists as a “Scoring term”, along with some other criteria and restrictions, will give you a results plot looking more like this:

There are many more features to discover: try the website, read the paper (Ruttkies et al 2016) and if you have any questions, please comment below!

Author: Emma Schymanski, 21/11/2017

]]>http://www.chemconnector.com/2017/12/08/guest-post-by-emma-schymanski-suspect-screening-with-metfrag-and-the-comptox-chemistry-dashboard/feed/03472The National Chemical Database Service Allowing Depositionshttp://www.chemconnector.com/2017/10/20/the-national-chemical-database-service-allowing-depositions/
http://www.chemconnector.com/2017/10/20/the-national-chemical-database-service-allowing-depositions/#commentsSat, 21 Oct 2017 02:05:36 +0000http://www.chemconnector.com/?p=3455The UK National Chemical Database Service (available here) has been online a few years now, since 2012. When I worked at RSC I was intimately involved in writing the technical response to the EPSRC call for the service and, in this blog, I outlined a lot of intentions for the project. A key part of the project from my point of view was to deliver a repository to store structures, spectra, reactions, CIF files etc as I outlined in the blog post.

“Our intention is to allow the repository to host data including chemicals, syntheses, property data, analytical data and various other types of chemistry related data. The details of this will be scoped out with the user-community, prioritized and delivered to the best of our abilities during the lifetime of the tender. With storage of structured data comes the ability to generate models, to deliver reference data as the community contributes to its validation, and to integrate and disseminate the data, as allowed by both licensing and technology, to a growing internet of the chemical sciences.”

In March 2014 at the ACS Meeting in Dallas I presented on our progress towards providing the repository (see this Slidedeck). ChemSpider has been online for over ten years and we were accepting structure depositions in the first 3 months and spectra a few weeks later (see blogpost). The ability to deposit structures as molfiles or SDF files has been available on ChemSpider for a long time and we delivered the ability to validate and standardize using the CVSP platform (http://cvsp.chemspider.com/) that we submitted for publication three years ago (October 28th, 2014) and is published here: https://jcheminf.springeropen.com/articles/10.1186/s13321-015-0072-8. With structure and spectra deposition in place for over a decade, a validation and standardization platform made public three years ago, and a lot of experience with depositing data onto ChemSpider, all building blocks have been in place for the repository.

Today I received an email into my inbox announcing “Compound and Spectra Deposition into ChemSpider“. I read it with interest as I guess it meant it was “going mainstream” in some way as it’s been around for a decade as capability. Refactoring for any mature platform should be a constant so my expectation was that this would show a more seamless process of depositing various types of data, a more beautiful interface, new whizz-bang visualization widgets building on a decade of legacy development and taking the best of what we built as data registration, structure validation and standardization (and all of its lessons!) and rebuilds of some of the spectral display components that we had. It’s not quite what I found when I tested it.

Here’s my review.

My expectations would be to go to http://deposit.chemspider.com and deposit data to ChemSpider. The website is simply a blue button with “Log in with your ORCID”. There is language recognizing that the OpenPHACTS project funded the validation and standardization platform work which is definitely appropriate but some MORE GUIDANCE as to what the site is would be good!

“Validation and standardisation of the chemical structures was developed as part of the Open PHACTS project and received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement no. 115191, resources of which are composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in-kind contribution.”

This means that it should be possible to deposit a molfile, have it checked (validated) and standardized then deposited into ChemSpider, having passed through CVSP. So what happened?

I downloaded the structure of Chlorothalonil from our dashboard and loaded it. The result is shown below. The structure was standardized and correctly recognized as a V3000 molfile. The original structure was not visible, there were no errors or warnings and the structure DID standardize.

The original isotope labels were removed, the layout was recognized as congested and partially defined stereo recognized. But it wouldn’t deposit. I tried many others and they would not deposit and was going to give up but tried Benzene, V2000, downloaded from ChemSpider. And….YAY….it went in. The result is below.

A unique DOI is issued to the record, associated with my name. It is NOT deposited into ChemSpider as far as I can tell because the structure is already in ChemSpider. There is also no link from ChemSPider back to my deposition, that I can find. My next try was to find a chemical NOT in ChemSpider and to deposit that. That failed. I tried Benzene again and it worked a second time. I judged that maybe a simple alkyl chain would work for deposition. The result is below.

The warning “Contains completely undefined stereo: mixtures” does not make sense at all for this chemical. PLUS it wouldn’t deposit.

I then tried to register a sugar as a projection with the result shown below. I consider this one to have some real errors and do not AT ALL like the standardized version.

I tried a simple inorganic. I think KCl should be recognized as an ionic compound as K+Cl-, at least SOME warning!?

The testing I did took about an hour overall. I identified a LOT of issues. I think this release, while it may be a beta release for feedback, is way premature and needs a lot more testing. I am hopeful that more people will fully test the platform as the ABILITY to deposit data, get a DOI, and associate it with your ORCID account, but it’s not obvious that anything is linked back to ORCID and it is nothing more than being used for login.

I did NOT test spectral deposition but am concerned that the request seems to be for original data. In binary vendor file format? Uh-oh. That’s not a good idea!

I hope this blog motivates the community to test, give feedback and push the deposition system to deal with complex chemistries so at least the boundary conditions of performance for Deposit.ChemSpider.Com, which appears to be more of writing a chemical to some other repository as there is no real connection to ChemSpider I can find (?), can be defined, the system can be improved and a community can be built around the functionality.

Building public domain chemistry databases is hard work. User feedback and guidance is essential. Please give your feedback and test the system.

]]>http://www.chemconnector.com/2017/10/20/the-national-chemical-database-service-allowing-depositions/feed/43455Call for Abstracts for ACS Spring 2018 Symposium ” Applications of Cheminformatics to Environmental Chemistry”http://www.chemconnector.com/2017/09/20/call-for-abstracts-for-acs-spring-2018-symposium-applications-of-cheminformatics-to-environmental-chemistry/
http://www.chemconnector.com/2017/09/20/call-for-abstracts-for-acs-spring-2018-symposium-applications-of-cheminformatics-to-environmental-chemistry/#respondThu, 21 Sep 2017 04:08:51 +0000http://www.chemconnector.com/?p=3451Grace Patlewicz and I have the pleasure of hosting a symposium at the Spring 2018 ACS National Meeting in New Orleans as outlined below. We believe that a presentation from you would enhance the line-up for the gathering and encourage you to consider our invitation. Our expectations are that we will have a full day of stimulating presentations and discussions regarding the application of cheminformatics to Environmental Chemistry. We sincerely hope you will consider our invitation and submit an abstract to the CINF division listed at https://callforpapers.acs.org/nola2018/CINF. Please confirm your intention to participate via email. Thank you in advance.

Applications of Cheminformatics to Environmental Chemistry

Cheminformatics and computational chemistry have had an enormous impact in regards to providing environmental chemists access to data, information and software tools and algorithms. There is an increasing number of online resources and software tools and the ability to source data, perform real time QSAR prediction and even read-across analyses online is now available. Environmental scientists generally seek chemical data in the form of chemical properties, environmental fate and transport or toxicity-based endpoints. They also search for data regarding chemical function and use, information regarding their exposure potential, and their transformation in environmental and biological systems. The increasing rate of production and release of new chemicals into commerce requires improved access to historical data and information to assist in hazard and risk assessment. High-throughput in vitro and in silico analyses increasingly are being brought to bear to rapidly screen chemicals for their potential impacts and interweaving this information with more traditional in vivo toxicity data and exposure estimation to provide integrated insight into chemical risk is a burgeoning frontier on the cusp of cheminformatics and environmental sciences.

This symposium will bring together a series of talks to provide an overview of the present state of data, tools, databases and approaches available to environmental chemists. The session will include the various modeling approaches and platforms, will examine the issues of data quality and curation, and intends to provide the attendees with details regarding availability, utility and applications of these systems. We will focus especially on the availability of Open systems, data and code to ensure no limitations to access and reuse.

The topics that would be covered in this session are, but are not limited to:

Standards for data exchange and integration in environmental chemistry

Implementations of Read-across prediction

Adverse Outcome Pathway data and delivery

Please submit your abstracts using the ACS Meeting Abstracts Programming System (MAPS) at https://maps.acs.org. General information about the conference can be found at http://www.acs.org/meetings. Any other inquiries should be directed to the symposium organizers:

]]>http://www.chemconnector.com/2017/09/20/call-for-abstracts-for-acs-spring-2018-symposium-applications-of-cheminformatics-to-environmental-chemistry/feed/03451Call for Abstracts for ACS Spring 2018 Symposium: “Open Resources for automated structure verification and elucidation”http://www.chemconnector.com/2017/09/20/call-for-abstracts-for-acs-spring-2018-symposium-open-resources-for-automated-structure-verification-and-elucidation/
http://www.chemconnector.com/2017/09/20/call-for-abstracts-for-acs-spring-2018-symposium-open-resources-for-automated-structure-verification-and-elucidation/#respondThu, 21 Sep 2017 03:40:04 +0000http://www.chemconnector.com/?p=3450I have the pleasure of hosting a symposium with Emma Schymanski at the Spring 2018 ACS National Meeting in New Orleans as outlined below. Our expectations are that we will have a full day of stimulating presentations and discussions regarding how Open Resources, specifically data and software, can support automated structure verification and elucidation. If this is an area of research for you please submit an abstract to the ANYL division listed at https://callforpapers.acs.org/nola2018/ANYL.

Open Resources for automated structure verification and elucidation

Antony J. Williams1 and Emma L. Schymanski2
1National Center for Computational Toxicology, US EPA, Research Triangle Park, Durham, NC, USA.
2Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, Luxembourg.
Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software approaches that can be used to support automated structure verification and elucidation, specifically for Nuclear Magnetic Resonance (NMR) and Mass Spectrometry (MS). This symposium will bring together a series of speakers to overview the state of data, tools, databases and approaches available to support chemical structure verification and elucidation. The session will cover the different databases and libraries available and examine the issues of data quality and curation. We intend to provide attendees with details regarding availability (both online and offline), utility and application of various tools and algorithms to support their identification and interpretation efforts. We will focus especially on the availability of Open systems, data and code with no limitations to access and reuse, yet reflect critically on the potential limitations and future needs of Open approaches. Case studies will demonstrate the potential for cheminformatics to enable single-structure elucidation through to high throughput, untargeted data discovery approaches. This work does not necessarily reflect U.S. EPA policy.

]]>http://www.chemconnector.com/2017/09/20/call-for-abstracts-for-acs-spring-2018-symposium-open-resources-for-automated-structure-verification-and-elucidation/feed/03450Online networking, data sharing and research activity distribution tools for scientistshttp://www.chemconnector.com/2017/08/08/online-networking-data-sharing-and-research-activity-distribution-tools-for-scientists/
http://www.chemconnector.com/2017/08/08/online-networking-data-sharing-and-research-activity-distribution-tools-for-scientists/#respondWed, 09 Aug 2017 04:01:26 +0000http://www.chemconnector.com/?p=3443This is just a short post, and I need to write more when I have time, about the result of a writing collaboration with Lou Peck and Sean Ekins on an article entitled “The new alchemy: Online networking, data sharing and research activity distribution tools for scientists” (http://dx.doi.org/10.12688/f1000research.12185.1). This took a LONG time to get published, and morphed from the original concept, but there appears to be a lot of interest judging by the views and downloads stats in the first few days (775 views and 20% of this number as downloads). That’s a good conversion rate. It’s open for PUBLIC COMMENTS and we welcome your feedback.

]]>http://www.chemconnector.com/2017/08/08/online-networking-data-sharing-and-research-activity-distribution-tools-for-scientists/feed/03443Predicting organ toxicity using in vitro bioactivity data and chemical structurehttp://www.chemconnector.com/2017/08/06/predicting-organ-toxicity-using-in-vitro-bioactivity-data-and-chemical-structure/
http://www.chemconnector.com/2017/08/06/predicting-organ-toxicity-using-in-vitro-bioactivity-data-and-chemical-structure/#respondSun, 06 Aug 2017 17:42:48 +0000http://www.chemconnector.com/?p=3442I get to work with some great scientists in my job. I am getting to work on projects that a couple of years ago were way out of my depth. Let’s be honest, I have no formal training as a toxicologist and my training is formally as an analytical scientist, then cheminformatician, then into publishing and informatics and now in the National Center for Computational Toxicology. I didn’t realize that the trial by fire would be so stimulating and fun but working at EPA is great. So many people make flippant comments about working for the government, leaving early, etc. We work HARD and are productive and, for me at least, I feel we are doing important work and making real contributions. The latest paper I am involved with is “Predicting organ toxicity using in vitro bioactivity data and chemical structure” (http://dx.doi.org/10.1021/acs.chemrestox.7b00084). The abstract is listed below…

“Animal testing alone cannot practically evaluate the health hazard posed by tens of thousands of environmental chemicals. Computational approaches making use of high-throughput experimental data may provide more efficient means to predict chemical toxicity. Here, we use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performance was assessed based on F1 scores using five-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%) and these gains were correlated (ρ= 0.92) with the number of chemicals. Overall, the results demonstrate that a combination of bioactivity and chemical descriptors can accurately predict a range of target organ toxicity outcomes in repeat-dose studies, but specific experimental and methodologic improvements may increase predictivity.”

]]>http://www.chemconnector.com/2017/08/06/predicting-organ-toxicity-using-in-vitro-bioactivity-data-and-chemical-structure/feed/03442Open Science for Identifying “Known Unknown” Chemicals http://dx.doi.org/10.1021/acs.est.7b01908http://www.chemconnector.com/2017/05/07/open-science-for-identifying-known-unknown-chemicals/
http://www.chemconnector.com/2017/05/07/open-science-for-identifying-known-unknown-chemicals/#respondSun, 07 May 2017 18:59:29 +0000http://www.chemconnector.com/?p=3436I am happy to announce the publishing of an article regarding “Open Science for Identifying “Known Unknown” Chemicals” at http://dx.doi.org/10.1021/acs.est.7b01908. I have been involved with two other articles about the identification of “Known Unknowns”.

The first one was a ChemSpider article: “”Identification of “known unknowns” utilizing accurate mass data and ChemSpider”. Journal of The American Society for Mass Spectrometry. 23: 179–185. doi:10.1007/s13361-011-0265-y.”

The second one was a recent article from the EPA: “”Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard”. Analytical and Bioanalytical Chemistry. 409: 1729–1735. doi:10.1007/s00216-016-0139-z.”

The most recent publication was a collaboration with Emma Schymanski from Eawag and it was a real pleasure to write this together. If you are interested in how Open Science can contribute to the challenges associated with the identification of known unknowns check out our latest publication!

]]>http://www.chemconnector.com/2017/05/07/open-science-for-identifying-known-unknown-chemicals/feed/03436In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learninghttp://www.chemconnector.com/2017/02/19/in-silico-prediction-of-physicochemical-properties-of-environmental-chemicals-using-molecular-fingerprints-and-machine-learning/
http://www.chemconnector.com/2017/02/19/in-silico-prediction-of-physicochemical-properties-of-environmental-chemicals-using-molecular-fingerprints-and-machine-learning/#respondMon, 20 Feb 2017 02:28:33 +0000http://www.chemconnector.com/?p=3432Recently we published on the curation of physicochemical data sets that were then made available as Open Data. The work was reported in:

There are little available toxicity data on the vast majority of chemicals in commerce. High-throughput screening (HTS) studies, such as those being carried out by the U.S. Environmental Protection Agency (EPA) ToxCast program in partnership with the federal Tox21 research program, can generate biological data to inform models for predicting potential toxicity. However, physicochemical properties are also needed to model environmental fate and transport, as well as exposure potential. The purpose of the present study was to generate an open-source quantitative structure–property relationship (QSPR) workflow to predict a variety of physicochemical properties that would have cross-platform compatibility to integrate into existing cheminformatics workflows. In this effort, decades-old experimental property data sets available within the EPA EPI Suite were reanalyzed using modern cheminformatics workflows to develop updated QSPR models capable of supplying computationally efficient, open, and transparent HTS property predictions in support of environmental modeling efforts. Models were built using updated EPI Suite data sets for the prediction of six physicochemical properties: octanol–water partition coefficient (logP), water solubility (logS), boiling point (BP), melting point (MP), vapor pressure (logVP), and bioconcentration factor (logBCF). The coefficient of determination (R2) between the estimated values and experimental data for the six predicted properties ranged from 0.826 (MP) to 0.965 (BP), with model performance for five of the six properties exceeding those from the original EPI Suite models. The newly derived models can be employed for rapid estimation of physicochemical properties within an open-source HTS workflow to inform fate and toxicity prediction models of environmental chemicals.

]]>http://www.chemconnector.com/2017/02/19/in-silico-prediction-of-physicochemical-properties-of-environmental-chemicals-using-molecular-fingerprints-and-machine-learning/feed/03432How Poor Altmetrics are for my old articles…http://www.chemconnector.com/2017/01/10/how-poor-altmetrics-are-for-my-old-articles/
http://www.chemconnector.com/2017/01/10/how-poor-altmetrics-are-for-my-old-articles/#respondTue, 10 Jan 2017 17:29:17 +0000http://www.chemconnector.com/?p=3420In preparation for a talk later this week I have been investigating adding Altmetric and Plum analytics scores into my online CV as we as Kudos Resources. I would expect that Altmetric scores would be VERY low for old articles as they were published way before the social networking tools existed. However, the Plum Widget should be useful in terms of showing citations, views and downloads etc. The Kudos resources will be meaningful since I have been working SLOWLY through my articles with the latest first.

I think the Altmetric scores shown below bears out my opinion since MOST don’t have any score whatsoever. However, this blog post should lift a number of them over the next few days.