Pseudo Open Notebook Science?

A topic of much discussion I see in the Science 2.0 world (it’s like the Renaissance, but with more Javascript!) is the idea of Open Notebook Science. In one version of Open Notebook Science, one simply opens up ones research notebook (or other equivalent) to outside access. For an example see Garrett Lisi’s research wiki. This is, of course, the grand ideal of science at its best: the question for the truth, the whole truth, and nothing but the truth, so help me Darwin. But of course, this idea has it’s problems. Most notably, of course, there is the political aspect: what is keep someone from stealing your absolutely ground-breaking, world-changing, breakfast-making ideas?

Thinking about this there are a few obvious ideas. One is, of course, access control. Requiring some sort of registration and authentication in order to view the notebook. This could provide a paper, err electronic, trail for any possible nefarious use. Someone this seems to me to violate the spirit of Open Notebook Science (and, sitting here typing this in Redmond, Washington, even I don’t want to take on the Open X crowd.)

But another idea occurred to me while working with the new TiddlyWiki I’m using as a notebook. One aspect of the TiddlyWiki is that it can back itself up everytime you save a file. Another aspect of TiddlyWiki is that you can have the program automagically save itself after every edit. These, of course, are in conflict, since doing lots of edits leads to lots of backups. A solution to this is the cool LessBackupsPlugin, which allows you to keep a fewer number of backups, limited by once per year, once per month, once per day, once per hour, etc. This keeps the number of backups down to a much more reasonable number.

But this got me thinking. Why note perform Open Notebook Science, but delayed. That is your own notebook is publicly accessible, but it is delayed by some set amount, say a month behind your actual notebook. This still violates part of the spirit of Open Notebook Science, but it also does open up your science for others to follow along. It also gives a disincentive for others to *ahem* “lift” your work since there is great uncertainty that you aren’t about to publish your results, and more follow up results which you’ve produced in the intervening months. Indeed I suspect people are more likely to contact you and say “hey did you think about this?” than to grab your work and run.

Of course I chose the delay time randomly. What would be even better is that you could give yourself a variable time for whatever you are working on. For instance, in creating factual notebooks which describe your notes about a paper, you may not want any delay at all. But for your active projects which you consider your bread and butter, you may want to set a delay which is longer.

33 Responses to Pseudo Open Notebook Science?

The other safeguard that’s been discussed, and would complement this idea, is logins: that is, make sure enough of your notebook is indexed by Google that potential collaborators can find you, but keep scoopable details accessible only to registered users. Might be hard to find a balance between “need to index” and “scoopable” though, which is where an embargo period could really help.

I think perhaps we should retire this metaphor. It works for apples, but not for humans. If a majority of people acknowledge and abide by community norms, the damage done by “bad apples” can be limited. Peter Suber calls this the “French Chef effect”.

This is an interesting idea. Blogging software already usually has a built-in ability to modify the timestamp on an entry, so it would really just be a matter of porting this over to the Open Notebook app.

I like the idea, because it removes one of the widespread concerns of doing Open Notebook Science, without totally crippling the concept.

I have thought about doing this in the past but I echo Ricardo’s concern that time delays would inhibit the collaborative process. It’s certainly a tricky problem balancing openness (and the possibility of useful collaborators coming out of the woodwork) with the desire to protect one’s intellectual property.

having been screwed more than once by (literally) the guy next to me, I think that it is very naive to think that most interactions would be good. but even if most were, the harm done by one unscrupulous bastard can be huge. I know many people who encode data in their written notebooks and keep the code on their person. so I wonder what use having that kind of information would be. Also, what would keep someone from deliberately mislabeling things or presenting known to be incorrect results. this stuff isn’t peer reviewed, so you can’t really hold an individual responsible? how can you tell an honest mistake from a deliberate effort to mislead? again, the few bad apples spoil the bunch.

Not this crap again! Who the fuck wants to read someone else’s lab notebook? I want to see digested, processed, analyzed data, with bad experiments thrown out. Maybe bloggers have the time to wade through the piles of shit in other people’s lab notebooks to find the meaningful nuggets, but working scientists do not.

And if you are talking about publishing curated, analyzed datasets on the Web independently of peer-reviewed publication, well this ain’t a “notebook”, and calling it “Open Notebook” is fucking stupid.

I like the french chef effect, great link. And I think that’s how it typically worked in science, but the sheer size of it can now undercut the impact of those sanctions available to the group against transgressors.

I still think it could work for science, but it works best within subfields (or sub-sub), and thus can fail to include many who are the arbiters of publication or promotion. Ideally though, this is something we need to reinforce. Somehow.

Ultimately, I think it will take the recognition that posts to wikis and blogs are real scientific contributions which matter for promotion and grants before a wider spread open notebook science can take root.

Also crucial will be the outcomes of the first few cases where ‘scooping’ or ‘stealing’ results occur. Fail to support those who initially made the experiments/observations, and allow significant credit to those who scoop but add little, and it’ll kill Open Notebooks for a large fraction of the community.

The idea of incorporating a delay into a wiki is pretty good. I actually built this in to my tiddlywiki by accident. Currently, after editing wiki entries locally, I rsync the wiki to a server, where it’s publicly viewable. If I’m working on something “secret,” such as a talk or a paper nearing publication, I’ll hold of on the sync until after the event. It would be tricky, but possible, to add a “release date” to individual entries — especially tricky for the typical tiddlywiki, since the entire wiki is one html file.

But I agree that this goes a bit against the spirit of open science. Ideally, we can hope that people not only start paying attention to wikis, but start citing them. In practice… people will still “steal” ideas. I wasn’t particularly worried about this with my wiki, since the ideas I was working on were weird ;), and clearly identifiable as mine to people familiar with my work — who are the people who matter to me. If anything, I was hoping people would steal some of my ideas, as that would give me potential collaborators. However, my situation was rather unusual, so it’s hard to recommend this strategy.

I think, in practice, having an open scientific notebook comes with pros and cons. On the downside, there’s a chance people will take your ideas and not credit you. On the upside, there’s the chance that people will take your ideas and credit you. And a well laid out wiki might attract enough attention that it gets the author enough positive attention to outweigh the drawbacks.

What I’ve found, in practice, is that most people reading my wiki are looking for reference material, not the cutting edge research. This includes notes on other peoples’ papers, as well as notes on material often found in textbooks. I’ve received many random comments that this or that wiki entry has helped someone trying to figure out some obscure detail. These compliments motivate the effort. And my overall feeling is… if people “steal” my ideas, that’s mostly a good thing, since the more people thinking about these things the better. It’s hard enough to give ideas away in physics…

PlausibleAccuracy, I must have been unclear. I meant that delaying didn’t prevent scooping totally.

Also, keeping a private/protected version and a public version would probably beat the purpose since the interesting work would remain closed and the chanced of having external collaboration would be largely reduced.

The hurdles here are far beyond technical, there is a mindset that needs to change and by the looks of some of the comments in this blog post alone, I see it far where I and many others would like to see it.

Said by someone who has no idea how theoreticians work. Someday biologists will actually come around to using theoreticians..then maybe you’d actually make some progress.

But again Physio: what _exact_ are you so frightened about? You’ve yet to articulate a point beyond: I don’t want to see other’s crap and other’s won’t either because I don’t want to. If someone else chooses to do it, why the hell do you care? Except that you might be wrong and then would have to wipe egg off your face, I don’t see why you fight so strongly against.

Unless of course you’re intimately involved in the silly laws about keeping notebooks and such and prosocuting fraud cases. Then you’d have a valid subject to talk about. But this “real scientists” shtick is boring me.

It actually looks similar to a paper notebook, with the added ability to interactively drill into the raw data. For example click on any of the NMR links on that page then left click and drag to expand any section.

Dave, I think your points are quite valid. There is nothing wrong with a Partial Open Notebook Science strategy. You have to do what makes sense for you and discussing options is a great way to get people thinking. Having a delay or waiting for your article to come out before sharing the notebook make sense if you have intellectual property concerns.

For my purposes it is important that the notebook be shared openly, without content discrimination. As mentioned above this is optimal for collaboration. Another important point is that we can follow exactly HOW science gets done only if we can track all the error corrections, related failed or irreproducible results, how long it takes to develop observations and conclusions from the raw data, etc.

I know that I’ve come to this late (and through PhysioProf’s latest rant) but more or less what Ian said. If you are talking a few months delay I don’t think that really makes a difference to scooping (unless you are much more efficient about writing papers than I am) and has the potential to remove a lot of benefits.

Sure, it might put people off trying to take the next obvious step and beat you to press, but it will probably also put people off dropping you a line saying ‘loved your experiment, it gave me this really cool idea’. If you’re going to be fair, you will expect these people to work on the cool idea for a couple of months, and then tell you. Which won’t be worth the effort any more because you and they have moved on, or because it didn’t work because they don’t know what you found at next. Wasted effort, wasted opportunity.

That said – its still much better than not making anything available at all – and if its a place where you feel comfortable starting out or exploring how it works for you then go for it! You don’t have to be hard core about this necessarily. It can still be open science, just not open notebook science.

I’m very much in favour of total openness in research: I’m just plain puzzled by the fear of other people “stealing” ideas. (And I’m saying this after having had a couple of projects scooped in the past.) A (mathematician) colleague of mine summarised my feelings very succinctly: “if several people can come up with the same idea, then how profound can it be?”

I’ve occasionally thought of “stealing” ideas I hear around but I’m sure that to actually do this would take *heaps* of time and energy unless the idea is really simple (in which case, probably not profound). I mean, how long does it take to understand a nice fully written scientific paper? It takes me at least so long that I often figure it’d be quicker to rederive the results myself.

(Don’t you think that it so unattractive when discussing research with someone and they get all paranoid and secretive: what does this say about what they think of you?)

The only reason (apart from laziness and what I write below) that holds me back from publishing my notes online is that I think it would just do a disservice to the community who, if they actually were interested, would have to wade through tons of crap to find some probably dubiously useful stuff. Nonetheless, since I’m always on the look out for good collaborators, I might try it out one day as an experiment. (Anyone who can extract goodness from my notes certainly deserves authorship anyways )

I want to strongly note that what I’ve said applies strictly only to theoretical sciences, and not experimental sciences where I understand that all it takes for a rich lab to steal a result is the name of a crucial chemical etc.

Mind you, I have recently changed my mind about this issue. Now that I have a couple of PhD students I have, for the first time in my life, become quite a bit more worried about scooping: the ideal initial PhD problem should be publishable and solvable and use a nice technique or so. Because PhD students often take a little longer than experienced academics, I worry that their results could be scooped. I don’t think I’m alone in this concern, I’ve recently noticed at conferences that otherwise super laid back academics are a quite cagey about what their students are working on.

Are there any experimental scientists out there advocating for and using open notebooks? Not navel-gazing theoreticians that sit around drinking coffee and making shit up, but actual real scientists: people fucking with shit in a laboratory. Because frankly, this sounds like the kind of batshit wackaloonery that people who don’t even know what a real lab notebook looks like would be propounding.

you had better read your own students’ raw notebooks. how else can you make sure they aren’t just making up results. I’ve worked with people who show me their results, I tell them, “wow, that’s not what I expected at all” and then they go back, re-run the experiment and voila, the new result conforms to what I said my expectations were. at that point they refuse to run the experiment again.

so it begs the question, Why the hell would I be right all the time? are they just trying to get their projects done? I mean, I do understand a few things here and there, but c’mon, sometimes the results actually do surprise you and that can be much more important than when they don’t.

re: theoreticians and biology mixing. that’s the pre-conference I’m at this weekend. it’s a bunch of people who have whacked out models of the biological system I study. but they have no data to support or refute the models, no proposal about what variables you could measure to get a handle on their models, and most importantly, no insight into how to design experiments to better understand the systems they’re modeling. One reason I get invited to this conference is that I measured a few things, did some fairly rudimentary analysis and got a curve that looks like one of the popular models in this field. who knows if it is important or correct? the modelers can’t tell me and neither can the clinicians. And they haven’t told me anything I didn’t already figure out on my own – like more time points would be useful to understand the dynamics of the system.

so… before you get into the idea that modelers can make an impact, make sure you have a sense of the particular subfield. you could almost certainly make a name for yourself, we need a lot more people who were trained in physical sciences paradigm, but would it be important?

I have a lot of data in my notebooks that are non-obvious, enabling data that if pointed in the right direction could lead to intellectual property. but I don’t have time to pursue them right now and I sure as hell don’t want to give them away. especially when funding is so tight – maybe a program announcement will come up where my unpublished work gives me a competitive advantage, even if the work was done 5 years ago. if I had to give away that data then I wouldn’t have a leg up with regard to getting funded. which would suck, since I had the insight and the wherewithal to do work in that area before. and it would certainly penalize the younger generation as there would almost certainly be NO MECHANISM WHATSOEVER to require senior researchers to digitize all their legacy data. and we should certainly be able to learn from all their unpublished work, right?

I’ll follow that up with the idea that there are certain types of experiments (at least in biology) that just don’t lead to publication, but are never the less very important. I know that may be hard to believe for people who just submit all their random crap to arxiv. but it is very true in other fields. I have a handful of examples right now where I have submitted manuscripts to the 3-4 relevant Medline-indexed journals and gotten rejections from all of them. sure, there are a lot of non-medline indexed journals, but I think publishing somewhere with an impact factor south of 1 is a waste of time and money. there is nothing that I can do about it except wait until the time is right.

also, in NIH funded biology, if you create a reagent with NIH dollars you have to make a reasonable effort to provide that reagent to the community. sometimes this is easy, sometimes its tough. in hypercompetitive fields you don’t want to signal the reagent(s) that you’ve made until you publish with them because 1) who knows if NIH would force you to share the reagent prior to publication if it is disclosed in an open notebook (that would have to be sorted out) 2) you would give away what the current focus of your lab is, providing your competitors with important information about what has and hasn’t worked for you and 3) biology doesn’t really require solid understanding right now, just the ability to tell a good story and the ability to conclusively demonstrate that whatever you see isn’t due to known factors. so the reagent you created, with all the troubleshooting to optimize production of that reagent that would be disclosed in your open notebook would just support unscrupulous labs – they wouldn’t have to do all the tedious, mind-numbing work to produce enough material to test, they could just test it right away.

what’s the big deal about the UsefulChem page? Why is that an example of why we should have open notebooks? It’s a cookbook experiment where the experimenter put in some date stamps, a few notes, and their own data populating fields like the yield of the synthesis. Which, I’m not sure 58% on an Ugi condensation is really that impressive. That seems fine for a web notebook – something that is proprietary to yourself, your research group or your company, but what if they were working on the synthesis of taxol back in the day and they found the optimal conditions to get to the Wieland-Mischer ketone two years before anyone else but then hit a roadblock. sure, advancement of science should come first, except of course that people have to be able to keep their jobs and move forward in their careers. you can’t screw people for the sake of science. how long will it take for people to get that into their heads. what you can do is screw peoples’ lifestyles for acute periods of time (weeks – months), but you can’t screw the people themselves.

I also have an open notebook on the OWW site. It’s easy enough to set up, but it’s not very old, so I don’t have a lot of perspective yet. I have to shuttle between two laboratories and it’s nice to be able to consult what I wrote on different projects when I am in the other place. In theory, it should be good for my students/postdoc to consult and add to, as well.

However, in practice, it’s not so much the wet lab stuff that makes it in, because of the time involved. If it weren’t wiki markup, and if I had anyone else in my lab interested in making the time investment, I’d note down a lot more. For instance, it’s a hassle (from my POV) plugging in cell culture photos, much as I’d like to. Lots easier to print out a photo of a culture or a gel (which is digital to begin with, so should be good for an electronic notebook) and just tape it into my physical lab notebook – or not, and just stick it in a folder on my computer, send it to the one or two people who might be interested in that particular result.

I totally agree with Mr. Gunn, and am waiting for some software-savvy scientist to come up with an electronic lab notebook that is really worth the investment of my time and does what I want it to, with a minimum learning curve. This is part of why I don’t invest totally in the OWW open notebooks. It would change the way I work.

I’d just as well someone else potentially benefit from the things that go wrong in lab. In particular, we went through two years of troubleshooting when we were doing the SAGE technique in our particular conditions. If we had noted, or if any of the other people whom we ended up consulting had noted, our troubles in an open lab notebook, one could have found those efforts with a Google search. And saved oneself a lot of time. Think of all the spectacular papers you read in Cell with a given technique that few have ever published with since – thinking ChIP-SAGE for instance. I would never have wanted to read the lab notebooks before the paper, but sure as heck would have afterwards.

In the meantime, even as a “wet lab” scientist, I still do a lot of sequence manipulation and read articles and it’s convenient to note with clickable links the articles I read a given day that are relevant to what I was thinking about the experiments that are underway. Or the Ensembl BioMart result.

Since there seem to be two parallel threads going here, I’ll post this twice as per the example of my illustrious colleague, PhysioProf.

But add a little more:
@Jon, you are so right about being able to look at your students’ lab notebooks! But I disagree about putting your not-so-publishable ideas out there for others to potentially scoop. If they can do better than you, take an idea and run with it, why, you’ve established some sort of primacy by publically affirming your idea to begin with. So why not let someone else take it through? It’s better for science, anyhow, given that you yourself wouldn’t have enough time to deal with it. Or are you worried that someday you’ll be short of ideas?

@Tobias: I completely agree that we need to protect the more vulnerable trainees. This is why a combination of protections along the line of Dave’s time-stamp business and restricted access might be ideal. (This is also precisely why my students and postdoc don’t particularly want to contribute to the OWW lab notebook, aside from not being drawn to markup language.) Internally, you could have access in real time to your lab’s (and perhaps your collaborating lab’s) notebooks; externally, these could be opened up six months down the road. (There is a lot of precedent for this in open access publishing models.)

Jon – I was responding to the question: “Are there any experimental scientists out there advocating for and using open notebooks?”
I wasn’t trying to impress anyone with our results- just showing an example of an Open Notebook in organic chemistry.

Hey all, sorry some of the comments got stuck in my junk folder. If you write a very long comment that has a lot of links in it you might email me to make sure it gets caught. For some reason comments that get put into junk don’t get emailed to me so I miss them and have to check manually for them (which I get to about once a week.)

If his contemporaries read the notebooks of Leonardo da Vinci, would the mirror-writing encryption have baffled them?

There’s also the tradition of notating things in anagrams, to time-stamp them, and when the embrago ends, reveal the anagram properly permuted. This is still done, as with some super-Earth discoveries.

But why stop at Science? Would you pay (I would) to see keyboard capture of even preliminary drafts of new fiction by Bear, Brin, Benford, Bradbury, Bova, and the like (to take examples only from the B’s)?

I am all for open source notebooks, but I think that if e.g. I post a lab write up and ***it’s not reproducible***, then that’s where it goes to pieces. If I am too arrogant or insecure or whatever to add that little snippet “Hey, this experiment doesn’t work, it’s not reproducible, etc.”, then there is really no way to advertise that it’s not reproducible…unless someone else tried it a few times and couldn’t succeed, then posted it on his or her open source notebook.

I think perhaps the thing that should be taught to up and coming scientists is then this: “It’s alright to be wrong, as long as you openly state it and try to figure out why.” But that message isn’t being sent in my courses