Open Source: the future of science?

By Sebastian Meznaric The recent scientific crisis around climate research data leaks has greatly damaged the credibility and respect usually accorded the scientific community. The collected global temperature data was the subject of statistical analysis where the scientists in question used a “trick” to conceal certain decreases in the temperatures measured. The “trick” apparently went […]

The recent scientific crisis around climate research data leaks has greatly damaged the credibility and respect usually accorded the scientific community. The collected global temperature data was the subject of statistical analysis where the scientists in question used a “trick” to conceal certain decreases in the temperatures measured. The “trick” apparently went unnoticed through the peer review process which ended up leading to the processed data being published in a high profile scientific journal. After failing to obtain the data by other means, the sceptics wishing to analyse the data for themselves were (as they saw it) forced to resort to using the Freedom of Information Act. Their efforts went unrewarded as the requests were routinely rejected and evaded by the researchers. This situation eventually exploded in dramatic style in November with startling news of the public data leaks conducted with the help of hackers based overseas. The damage that was caused to the image and reputation of the scientific community was grave and should lead us to ask: could there be an overarching solution to deal with such problems in the future?

Open source, whether in computing or in broader terms, is a principle advocating free access to the end product’s “source materials”. This may be the source code (in the case of computer software), it may also be the design specifications for a product or it could be the data used for a statistical analysis in a scientific project. The main guiding principle behind it is peer production by collaboration. The end product is made available to the general public at no cost at all.

The creative practice of sharing the source of one’s work is nowhere more appropriate than in science. The work is by its nature collaborative and very often publicly funded. As such, it should be freely available for public examination.

Other than raw data, there are numerous scientific projects where a computer programme is the key part of the project. The need for verification of the results by the scientific community would dictate that the code be made available for inspection and modification. Indeed, if in the climate scandal noted above, the raw collected data had been made available from the beginning, the errors in the analysis could have been noticed and corrected early, benefiting both the integrity of the scientific process and the search for truth. In today’s competitive research environment, however, the data and source code for computer programmes are not always freely available.

The competition among various research groups makes the idea of hiding one’s software code and/or data (we will henceforth simply use “source” for both terms) extremely attractive to most scientists. The implication is that sharing the source would make it easier for other groups to reap the benefits of one’s hard work. However, it would be very easy (and indeed necessary) to give credit to the principal author of the source by making them a co-author of the resulting publication. Indeed, the practice of making people who collected the data and/or wrote the code co-authors of journal articles is actually already well established. For large projects where co-authorship is impractical, like perhaps CERN-related findings, the name of the open source project can simply be referenced in the acknowledgements. Such practices would avoid having a very large number of authors while at the same time give credit where credit is due.

Another commonly used argument is that competition drives the scientific research better than openness. Different competing research groups in the same field might therefore use their own self-written versions of software designed to accomplish very similar tasks. Often, these groups would compete with one another in adding new functionalities and improving the performance of their code in order to publish new results before other competitors. However, as we see with Wikipedia, Linux and other greatly successful open source projects, more “eyes” see better and, more importantly, think better. Scientific collaboration among peers very often leads to ideas that one would not think of in a smaller group or on their own. Indeed, dramatically increasing the group of people working together on a scientific software project often quickly leads to a sky rocketing improvement in performance and applicability. Perhaps even more crucially, researchers would have more time to focus on science rather than coding or collecting data.

The open source concept has been successfully used in the commercial world, notably in the automobile industry, where the patent sharing started by Ford led to automobile design innovations moving faster than ever to the great benefit of the general public. The sharing of technology did not at all reduce the competition among the companies nor their innovative drive.

The adoption of open source models in science would not only foster greater creativity, but would also attract interest in science from programmers and other interested parties, further increasing our global productivity and efficiency.

For instance, the field of biotechnology is fast adapting to the drive for greater openness in the scientific process. Other disciplines will hopefully follow suit to harness the greater efficiency and openness offered by the open source development model. Whether the scientific community at large adopts the open source paradigm remains a matter of speculation but, considering the climate data leak fiasco, the potential benefits are surely beyond dispute.

Sebastian Meznaric is a theoretical physicist and doctoral reseracher at the University of Oxford. His areas of interests include the study of information theory in quantum mechanics. He is also a keen observer of politics and current affairs.

4 Comments

I fully agree that the scientific community would be well served by adopting more of the practices of the open source movement. Especially in view of the increasing reliance on computational methods it is imperative that the basic tools be scrutinized by a larger group of people.
I find quite peculiar that the values of open source are not more widely practiced among scientist since the free software movement originated in academia.

I agree that open standards should always be something to aim for in the academic community.

There are a number of open access journals (see links below) however these make open the end product of their research, not the process itself.

As mentioned in the article, software (and also hardware), is rarely shared, only the results which draw from that hardware. There are a large number of freely accessible research solutions (eg. BLAST), however propriety software (closed source) still seems to have a stranglehold over methods used in scientific research.

Similarly, manufacturing processes and non-computer methods are not always made “open” even in open access journal articles. Some of this is accidental censorship by omission, or possibly deliberate action which makes the study unrepeatable, due to the non-inclusion of one value (say, time length a condition was applied for).

Scientists and researchers wish, rightly, to preserve their discoveries and techniques as their own (as the article states). However this does not have to be done by patents, or closing the source material.

Scientific literature has a very useful metric, the citation index.

A fully described study with an innovative, effective method made open will attract lots of citation (using the method without doing so would be plagiarism). A regularly cited method will become known as the property of the researcher (eg. the Polymerase Chain Reaction, developed by Kary Mullis) but remain for the use of all, encouraging citation.

It can therefore be argued that in addition to being beneficial to the scientific community as a whole, open source methods and open access publishing are beneficial to the individual or group carrying out the published research.

‘The collected global temperature data was the subject of statistical analysis where the scientists in question used a “trick” to conceal certain decreases in the temperatures measured. The “trick” apparently went unnoticed through the peer review process which ended up leading to the processed data being published in a high profile scientific journal.’

This is utter bullshit. The author should be ashamed to repeat such complete nonsense, and I’m surprised Ceasefire publish this kind of thing. The ‘trick’ was actually a method which was described in the paper. Thus it was not unnoticed by anyone.

‘The paper in question is the Mann, Bradley and Hughes (1998) Nature paper on the original multiproxy temperature reconstruction, and the ‘trick’ is just to plot the instrumental records along with reconstruction so that the context of the recent warming is clear. Scientists often use the term “trick” to refer to a “a good way to deal with a problem”, rather than something that is “secret”, and so there is nothing problematic in this at all. As for the ‘decline’, it is well known that Keith Briffa’s maximum latewood tree ring density proxy diverges from the temperature records after 1960 (this is more commonly known as the “divergence problem”–see e.g. the recent discussion in this paper) and has been discussed in the literature since Briffa et al in Nature in 1998 (Nature, 391, 678-682). Those authors have always recommend not using the post 1960 part of their reconstruction, and so while ‘hiding’ is probably a poor choice of words (since it is ‘hidden’ in plain sight), not using the data in the plot is completely appropriate, as is further research to understand why this happens.’
RealClimate 2009

This is something which was open knowledge before the email hack, it was repeated as soon as the bogus claims about hiding the decline were made, and was corroborated by each of the independent investigations into the UEA email hack. How the author has missed all of this is entirely beyond me. However it’s pretty clear that he knows very little about a very well documented episode.

Sebastian Meznaric Oct 26, 2011 17:49

Unfortunately Sy is right. I was aware of the media campaign at the time claiming that the trick referred to the climate data, but not the facts of the matter. I do apologise to anyone whose personal life may have been negatively impacted as a result of my claims in this article. However, I believe that despite this, the overall message of the article that scientific progress could be improved by greater openness is unaffected. Collaboration is always a good idea, and shared data and source enables other scientists to perform different analysis on an existent data without having to repeat the experiment (although repeating the experiment is often a good idea regardless).