Pages

Saturday, June 24, 2017

I blogged earlier today why I try to publish all my work gold Open Access. My ImpactStory profile shows I score 93% and note that with that 10% of the scientists in general score in that range. But then again, some publisher do make it hard for us to publish gold Open Access. And then if STM industries spreads FUD for their and only their good ("Sci-Hub does not add any value to the scholarly community.", doi:10.1038/nature.2017.22196), I get annoyed. Particularly, as the system makes young scientists believe that transferring copyright to a publisher (for free, in most cases) is a normal thing to do.

As said, I have no doubt that under current copyright law it was to be expected that Sci-Hub was going to be judged to violate that law. I also blogged previously that I believe copyright is not doing our society a favor (mind you, all my literature is copyrighted, and much of it I license to readers allowing them to read my work, copy it (e.g. share it with colleagues and students), and even modify it, e.g. allowing journals to change their website layout without having to ask me). About copyright, I still highly recommend Free Culture by Prof. Lessig (who unfortunately did not run for presidency).

To get a better understand of Sci-Hub and its popularity (I believe gold Open Access is the real solution), I looked at what literature was in Wikidata, using Scholia (wonderful work by Finn Nielsen, see arXiv). I added a few papers and annotated papers with their main subject's. I guess there must be more literature about Sci-Hub, but this is the "co-occuring topics graph" provided by Scholia at the time of writing:

It sounds like a problem not so common in western Europe, but it was when I was a fresh student (around 1994). The Radboud's University Library certainly did not have all journals and for one journal I had to go to a research department and sit in their coffee room. Not a problem at all. Big Package deals improved access, but created a vendor lock-in. And we're paying Big Time for these deals now, with insane year-over-year inflation of the prices.

But even then, I was repeatedly confronted with not having access to literature I wanted to read. Not just me, btw, for PhD students this was very common too. In fact, they regularly visited other universities, just to make some copies there. An article basically costed a PhD a train travel and a euro or two copying cost (besides the package deal cost for the visited university, of course). Nothing much has changed, despite the fact that in this electronic age the cost should have gone down significantly, instead of up.

That Elsevier sues Sci-Hub (about Sci-Hub, see this and this), I can understand. It's good to have a court decide what is more important: Elsevier's profit or the human right of access to literature (doi:10.1038/nature.2017.22196). This is extremely important: how does our society want to continue: do we want a fact-based society, where dissemination of knowledge is essential; or, do we want a society where power and money decides who benefits from knowledge.

But the STM industry claiming that Sci-Hub does not contribute to the scholarly community is plain outright FUD. In fact, it's outright lies. The fact that Nature does not call out those lies in their write up is very disappointing, indeed.

I do not know if it is the ultimate solution, but I strongly believe in a knowledge dissemination system where knowledge can be freely read, modified, and redistributed. Whether Open Science, or gold Open Access.

Sunday, June 11, 2017

Over the past 20 years I have had endless discussions into what the research is that I do. Many see my work as engineer, but I vigorously disagree. But some days it's just too easy to give up and explain things yet again. The question came up on the past few month several times again, and I am suggested to make a choice. That modern academia for you: you have to excel in something tiny, and complex and hard to explain ambition is loosing from the system based on funding, buzz words, "impact", and such. So, again, I am trying to make up my defense as to why my research is not engineering. You know what is ironic? It's all the fault of Open Science! Darn Open Science.

In case you missed it (no worries, many of the people I talk in depth about these things do, IMHO), my research is of theoretical nature (I tried bench chemistry, but my back is not strong enough for that): I am interested in how to digitally represent chemical knowledge. I get excited about Shannon entropy and books from Hofstadter. I do not get excited about "deep learning" (boring! In fact, the only fun I get out of that is pointing you to this). So, arguably, I am in the wrong field of science. One could argue I am not a biologist or chemist, but a computer scientist, or maybe philosophy (mind you, I have a degree in philosophy).

And that's actually where it starts getting annoying. Because I do stuff on a computer, people associate me with software. And software is generally seen as something that Microsoft does... hello, engineering. The fact that I publish papers on software (think CDK, Bioclipse, Jmol) does not help, of course.

That's where that darn Open Science comes in. Because I have a varied set of skills, I actually know how to instruct a computer to do something for me. It's like writing English, just to a different person, um, thingy. Because of Open Science, I can build the machines that I need to do my science.

But a true scientist does not make their own tools; they buy them (of course, that's an exaggeration, but just realize how well we value data and software citations at this time). They get loads of money to do so, just so that they don't have to make machines. And just because I don't ask for loads of money, or ask a bit of money to actually make the tools I need, you are tagged as engineer. And I, I got tricked by Open Science in fixing things, adding things. What was I thinking??

Does this resonate with experience from others? Also upset about it? What can we do about this?

(So, one of my next blog posts will be about the new scientific knowledge I have discovered. I have to say, not as much as I wanted, mostly because we did not have the right tools yet, which I have to build first, but that's what this post is about...)

Saturday, June 10, 2017

This paper was long overdue. But software papers are not easy to write, particularly not follow up papers. That actually seems a lot easier for databases. Moreover, we already publish too much. However, the scholarly community does not track software citations (data citations neither, but there seems to be a bit more momentum there; larger user group?). So, we need these kind of papers, and just a version, archived software release (e.g. on Zenodo) is not enough. But, there it is, the third CDK paper (doi:10.1186/s13321-017-0220-4). Fifth, if you include the twopapers about ring finding, also describing CDK functionality.

Of course, I could detail what this paper has to offer, but let's not spoil the article. It's CC-BY so everyone can readily access it. You don't even need Sci-Hub (but are allowed for this paper; it's legal!).

A huge thanks to all co-authors, John's work as release manager and great performance improvements as well as code improvement, code clean up, etc, and all developers who are not co-author on this paper but contributed bigger or smaller patches over time (plz check the full AUTHOR list!). That list does not include the companies that have been supporting the project in kind, tho. Also huge thanks to all the users, particularly those who have used the CDK in downstream projects, many of which are listed in the introduction of the paper.

Search This Blog

This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!

About Me

Assistant professor at the Dept of Bioinformatics - BiGCaT at NUTRIM, Maastricht University, studying biology at an unsupervised and atomic level. Open Science is my main hobby resulting in participation in, among many others, Bioclipse, CDK and WikiPathways. ORCID:0000-0001-7542-0286. Posts on G+ are personal.

Cookies

In the EU there is a directive upcoming requiring websites to warn people about HTTP cookies. This website uses the Blogger.com platform, Google Adsense (not that is it actually paying anything significantly), and a few scripts to count how often a blog post was tweeted, using Topsy and LinkedIn. These services undoubtedly make use of cookies, which you can disallow in your browser.