Post Tagged with: "open source"

In my last post, I briefly covered the‘share or not to share’ debate involving non-commercial software. In this post I’ll delve deeper into the issue by discussing how commercially available research software further complicates the situation. I’ll focus on perhaps one of the most controversial conflicts in the chemistry software: Gaussian Inc. vs Banned by Gaussian.

In the 1950s and 60s Prof. John Pople (1998 Nobel Prize winner) and his research group at Carnegie-Mellon University were focused on the development of ab initio quantum calculation methods. The group incorporated Gaussian orbitals – rather than Slater-type orbitals, which were more computationally intensive – into a computational chemistry program for molecular electronic structure calculations. The program, Gaussian 70, was released as open source software through the Quantum Chemistry Program Exchange (QCPE) in 1970.

In 1987, Carnegie Mellon University was issued a software license for the program and, ever since, it’s been developed and sold by Gaussian, Inc. Prices (pdf) for the Gaussian software package range from $2,500 for a single computer to $35,000 for an institution-wide license.

Gaussian was initially used only by theoreticians. However, as I mentioned in my last post, the continuously increasing power of personal computers as well as the addition of a user-friendly interfaces have made the software so accessible that even a computationally inept synthetic chemist (like myself) can perform high level ab initio calculations with a half dozen mouse clicks.

Gaussian is an important tool for many chemists, but it’s has also been a center of controversy. Since its commercial release a number of individuals and institutions have been Banned by Gaussian (BBG), which means they are prohibited by Gaussian Inc. from purchasing or using any version of Gaussian software.

Few researchers were using computers 30 years ago. This quickly changed with the release of several commercially viable personal computers in the 1980s. Since then, processing power has increased and the cost of computers decreased at an exponential rate (see Moore’s Law).

It’s no surprise that computers are now pivotal in chemistry research. We use them in a wide range of calculations – from determining the 40th decimal place of the absolute energy of He to modeling the release and distribution of toxic chemicals in river basins. The software used to address these complex problems is becoming increasingly accessible and easy to use too. There are already a variety of cell phone apps for chemistry related problem solving.

Yet, while the prevalence of software and computer-based research continues to grow, the rules for publishing results and sharing software lags behind. The magical/miracle nature of black-box calculations is disconcerting to individuals that want to know how the answers were obtained (see Sidney Harris cartoon). A palpable concern is growing in the scientific community around the sharing of software – and the foundational source code -necessary to reproduce published results. Two recent opinion pieces, one in Science titled, “Shining Light into Black Boxes” and the other in Nature titled, “The case for open computer programs” are trying to bring attention to this issue. The articles discuss the advantages and apprehensions of sharing, as well as suggest possible changes. Below is a summary of the points raised by the authors of the two articles – as well as the thoughts others (including myself).

Advantages to sharing software and source code:

Reproducibility: As stated by Ince et. al., “The vagaries of hardware, software and natural-language will always ensure that exact reproducibility remains uncertain…” without the release of source code in its entirety.

Catching errors: A simple mistake in converting units, assigning missing values as zero, rounding errors, or a misplaced decimal point, can wildly skew outcomes (see Office Space). We can only see and correct errors if we can see the source code.

Facilitating progress: All publications require that data, equations, materials, methods, and instrumentation are disclosed so that the results can be tested and furthered by others. We are all better served when source code is disseminated in a similar manner so that programs can be studied and repurposed in future research.

Teaching tools: Real, applied examples – that are relevant to research – are useful for new students and researchers learning to program and develop code.

Openness: Despite the competition to acquire funding and to publish first, we are all joined in the endeavor of understanding the rules that govern the universe. The open sharing of information has been and will continue to be the foundation of scientific progress.

Relying on faith: No matter how prolific or respected you are as a researcher, the implicit assertion, “Trust me, the program works the way I say it does” is not an acceptable means of justifying your results. On a fundamental philosophical level, black box justifications like that should be socially unacceptable in the sciences.

Chemistry Twitterverse

Last updated: 0.3 minutes agoRT @biominerals Please RT new TT position in Geochemistry of Near Surface Environments at the University of Minnesota! Excellent departme… →RT @LeiliMortazavi Because of Trump's #travelban, I'm not able to present at SfN @Neurosci2018, which would have been a great stepping sto… →@WorkentinChem Me too. She’s much more adventurous than I ever was (or am now!) →@wfpaxton I do my best proof reading after hitting "submit" →@ihearttheroad Dude, my college did NOT have Bon Appétit-rated fare. →@TomChivers It's almost like the news media chooses to report things that don't happen very often... →RT @gravity_levity 1/ Who wants to hear some scientific intrigue?
A few weeks ago, a group of physical chemists posted a paper online an… →