A personal view on statistics in earth sciences

Menu

Monthly Archives: August 2016

During my PhD I worked on Quality Assurance of Environmental Data and how to exchange quality information between scientists. I developed a concept for a possible workflow, which would help all scientists, data creators and re-users, for making data publications much more useful. One major foundation of this were quality tests, which I either taken from existing literature or developed anew.

Part of this work was the development of a proof-of-concept implementation of the methodologies. I used R, which is my prime language for quite a while, to design an as much as possible automisable test workflow. It was quite complex and in retrospect a bit too ambitious for real world applications. Anyway, as I prefer open science, I published it as an extension package for R in 2011: qat – Quality Assurance Toolkit.

The publication process was more challenging as anticipated. For each function, and my package had more than a hundred, a detailed help file was requested, which cost me at that time quite a while to create. I also wanted to add additional information, like an instruction manual, so that at least in theory it would have been possible to use the full functionality (like automatic plotting and saving of the test results) could be understood. Finally, when it was uploaded, I was happy and extended it until my PhD project came to an end.

Unfortunately, with this the work on the package has not stopped. R as a language is constantly changing, not really on the day-to-day tools, but in the background of the packages. New requirements come up now and then, usually associated with a deadline for package maintainers. What is quite simple to solve for small packages, can be a real challenge for complex ones like mine. I had to eliminate my instruction manual when the vignette system changed and created a dedicated website to have it still accessible. Also I had to replace packages I depend on, which is usually associated with quite a bit of change in the code.

All these changes are doable, but the big problems start with the requirement, that a newly uploaded package has to fulfil the current norms of the R packages. A package, which was fine a few months earlier has to change dramatically with the next update. This leads usually to a time problem, as each update needs therewith several days. So minor changes to the original code lead to a heavy workload. This lead to the situation, that I was not able to update it on time when the last deadline turned up and so my package went to archive. Half a year later I found some time and have now brought it back up to the CRAN network.

All in all, this workload is keeping me off to create new R packages. Making them would be feasible, but maintaining them is a pain. With these constant policy changing measures, R gets more and more out of fashion for heavy users and with it, it is in danger to lose out compared to other languages like python in teaching for the next generation of scientists. My personal hope is that future development will lead to a more stable policy on the package policy within R, so that more packages will be available also for the future. As things stand, I am happy to have my package up again, but when the next deadline will enter my mailbox, I will again have to evaluate the threatening workload, before I can afford to schedule a new release.