Site Search Navigation

Site Navigation

Site Mobile Navigation

Defeating Bedlam

By Olivia Judson December 16, 2008 10:00 pmDecember 16, 2008 10:00 pm

This week, I want to look at one of the unglamorous, but essential, parts of science: the problem of how to organize the information you have so that you know what you’ve got. For, like everything else in the digital age, the process of collecting and managing scientific information has been evolving. Fast.

Here’s what I used to do, way back, oh, seven years ago when I was writing a book about the sex lives of animals. When I wanted to do research on a topic, I would go to the university library — how quaint! — and photocopy the scientific papers I wanted to read. Papers such as “Homosexual rape and sexual selection in Acanthocephalan worms” from the journal Science. Or “Deformed sperm are probably not adaptive” from Animal Behaviour. If I was looking for something more obscure — say, “A review of tool use in insects” from Florida Entomologist — I sometimes had to go to a specialist library, like the one in London’s Natural History Museum.

Having collected the papers, I would take them back to my office, type the bibliographic details (authors, title, year published and so on) into my computer and put the photocopies into folders with other papers on the same general topic. In the case of the Acanthocephalan worms, it was a folder labeled “sabotage”; for the deformed sperm, it was “other sperm.” When the time came to write up my discoveries and thoughts on the subject of sperm evolution, or how males sabotage their rivals, I went to the relevant folder, read the papers, made notes on them and started writing.

As a system, it was a little clumsy — photocopying was a bore, and if I wanted to spend a couple of months writing somewhere other than my office, I had to take boxes of papers with me — but it worked. I knew what I had and where it was.

Then the scientific journals went digital. And my system collapsed.

On the good side, instead of hauling dusty volumes off shelves and standing over the photocopier, I sit comfortably in my office, downloading papers from journal Web sites.

On the bad side, this has produced informational bedlam.

The journal articles arrive with file names like 456330a.pdf or sd-article121.pdf. Keeping track of what these are, what I have, where I’ve put them, which other papers are related to them — hopeless. Attempting to replicate my old way of doing things, but on my computer — so, electronic versions of papers in electronic folders — didn’t work, I think because I couldn’t see what the papers actually were.

And so, absurdly, it became easier to re-research a subject each time I wanted to think about it, and to download the papers again. My hard drive has filled up with duplicates; my office, with stalagmites of paper. And it isn’t just that I have the organizational skills of a mosquito. Many of my colleagues have found the same thing. (Yes, we talk about it. Oh, they are lofty, the conversations in university common rooms.) In short, access to information is easier and faster than ever before (for a caveat, see the notes, below, but there’s been no obvious way to manage it once you’ve got it.

Several pieces of software are now being developed to address this problem. I want to look at two of them here. The first is called Zotero; the second, Papers. Both are in version 1 and are still a bit buggy; but each has the potential, I think, to become a valuable tool for research.

Zotero aims to let you build a library of useful books and articles that you encounter while surfing online. It’s an extension of the Web browser Firefox, and as you’d expect, it’s free to download and easy to install.

Once you’ve installed it, each time you visit a Web page that contains items — books, newspaper articles, soundtracks, films, etc. — with bibliographic information, it extracts that information and allows you to save it to your Zotero library if you want to.

So, suppose you’re interested in books about the psychology of war, and you go to Amazon and type “On Killing” into the search box. A list of books appears; Zotero collects the information for all of them and allows you to select the ones you want to keep. These are then put into your Zotero library. Once they’re there, you can make notes on them, put them into folders with other items that are related, and so on. If you ask it to, Zotero will see if it can find a given book in a local lending library. And, supposedly, you can also pull bibliographic information from Zotero into documents you’re writing, but I haven’t tried that part yet.

It’s a powerful piece of software with a lot of capabilities, though not all of them work as well as they could. For instance, it’s hit-or-miss with newspaper articles — sometimes it recognizes them, sometimes it doesn’t — and it can’t interpret information from, alas, my local lending library. It does, however, allow you to screen grab, so you can still collect such information if you want it. The screen grab also allows you to add interesting Web pages to your Zotero library. (This is different from storing the link to a Web site. The screen grab gives you the page as it was when you looked at it; clicking a link gives you a site as it is today.)

A minor quibble: if you use a small laptop, as I do, you may find the Zotero window occupies too much of the screen. But I shall certainly keep using it, though not, perhaps as its conceivers intended. For me, it’ll be a scrapbook of interesting stuff — books to buy later, press releases on subjects I think I might write about one day, magazine pieces about cities I’m thinking of visiting.

For the bulk of my researches, however, I shall use Papers. This software has been designed for the Macintosh by two avid fans who call themselves Mekentosj; it only works on the Macintosh platform. It’s not free, but it is quite cheap (20 pounds sterling; 40 U.S. dollars) and, for me, it’s been worth the money. For it solves the problem I started out describing — how to keep on top of scientific articles. How to know which ones you have, where they are, and what else you’ve got on the same subject.

The makers describe it as iTunes for .pdf files, and that’s broadly right. (For anyone who’s never encountered these things, a .pdf file is a type of document file that any computer can open using a free downloadable piece of software. This is the form electronic journal articles come in, and it means they look just as they would have done if you were reading the journal the old fashioned way. iTunes is a piece of music management software.) The idea is that, when you download an article, it goes into your Papers library. The bibliographic information immediately appears; so does, if you’re lucky, the “metadata” — like the abstract and the list of subjects that the authors thought their article touches on. (I say “if you’re lucky” because this doesn’t always happen automatically.) The document itself gets neatly filed in a folder on your hard drive, and renamed by authors and year. Gone are the days of 456330a.pdf and sd-article121.pdf. Hallelujah.

And that’s just the beginning. Not only can you read the papers, annotate them, find them and create folders of papers on related subjects, you can also use the software to search the big scientific databases like PubMed and the Web of Science. (Such databases are where you go to find out what’s already been published on the subject you’re interested in; it’s where most scientists find out about the papers they want to collect.) It doesn’t (yet) replace bibliographic software such as Endnote; but it can be used with it quite neatly.

Papers does have some teething problems. As I said, it’s still buggy, so not everything functions as it should. Moreover, the way it works is not always intuitive, and there’s no formal “help.” Instead, if you have a question, you have to wade through user forums to try to see if anyone else has had the same question before — and, more to the point, whether anyone has answered it. But after a couple of days of experimenting, I got it doing exactly what I need.

Organizing materials is always idiosyncratic. I have one friend who organizes the novels he owns by the year in which the books were published; another goes by the color of the spine. (The first accused the second of having the soul of an interior decorator.) But the important thing is not how you do it, but whether it works — whether you can find what you’re looking for. These bits of software open up possibilities; for some people they will be useful, for others they won’t. Some will use both, others neither. For me, well, a few days after discovering Papers, I put 20 sacks of real paper into the recycling bin. At last, I’m back to knowing what I have and where it is.

Bedlam has been defeated.

**********

NOTES:

One caveat. I say “access to information is easier and faster than ever before.” With respect to scientific information, this is true for people within universities, but not for those without them. One of the consequences of the scientific journals going digital is that it has become harder for members of the public to get access to original scientific information. It used to be the case, for example, that anyone could get permission to spend a day at the library at Imperial College; once there, they could read any of the journals on the library shelves. Now, subscriptions to the paper editions of many journals have been stopped — the journals are no longer physically there — and only members of the university are allowed access to the online versions. Some journals give free access, at least to back-issues; but many do not. Then, if you are not a member of a university and you want to read some articles, they may cost you as much as $30 each. I think this is a pity. Perhaps not many people want to read original scientific research; but somehow, it seems against the spirit of the enterprise.

In case anyone’s interested, here are the full details for the articles I refer to. For the worms, see Abele, L. G. and Gilchrist, S. 1977. “Homosexual rape and sexual selection in Acanthocephalan worms.” Science 197: 81-83. For deformed sperm, see Harcourt, A. H. 1989. “Deformed sperm are probably not adaptive.” Animal Behaviour 37: 863-865. For insects and tools, see Pierce, J. D. J. 1986. “A review of tool use in insects.” Florida Entomologist 69: 95-104.

Many thanks to Austin Burt, Gideon Lichfield and Daniel Simpson for insights, comments and suggestions.

My personal experience with Zotero when writing a recent biology journal article: It is FANTASTIC for collecting references, organizing materials, and writing the first draft. The best feature must be manually turned on in the preferences after installing: “Automatically attach associated PDFs…when saving items.” After activating this feature, if you download a set of search results from a site (e.g., //www.biomedcentral.com/), you also get the full PDFs of the papers in one step. Then you can read them offline, in a coffee shop, airplane, wherever.

However, the bibliography features (e.g., in Microsoft Word) are not ready for prime time. I would get tripped up with simple formatting issues. Author names instead of initials. Journal titles spelled out instead of abbreviated. Title Case in the title of an article instead of Sentence case. You get the idea. This is the biggest area that needs improvement in my opinion. I switched back to EndNote at this point, but Zotero was invaluable for saving time until then.

Whats wrong with just using Endnote or similar software with the added step of re-naming the pdf files with author/name or any other such scheme? Endnote neatly stores files as attachments if you so wish (not just pdf’s but pictures, sounds, movies). I ask because I started using it only recently and so far it seems to do exactly what you wish.
Good luck with your research and keep up the interesting columns!

Good article. Organizing data seems to be a problem now that there is a torrent of it. I just can not help but wonder whether now that there seems to be so much that the words are less concentrated and meaningful.

I also wonder weather with so much data there seems to be lesss ability to find the data that really matters. Everybody it seems almost, lives in there deap worlds of knoledge deep but narrow, concentrated to the point where they are increasingly unable to put it all together and produce knowledge that integrates the world together. knowledge and insights that can change lives if not the world.

Olivia, glad you are one of the special people who can put it together. It is not the tool but the peroson who uses it and knows how to apply it and share it in your blog.

The phrase “With respect to scientific information, this is true for people within universities” requires a qualification: people within “rich universities” have access; at many (most?) universities, access to most scientific journals is available only through interlibrary loan, not via direct download from journal websites.

This article is a godsend. In addition to the piles of papers I have everywhere in my office, I now have piles of papers everywhere in my computer!

Olivia, what software do you recommend for organizing the pile of stuff that will become a book? I am now working on the second edition of my field theory book and even though the editor allows me to write changes by hand on the margin, I am still going crazy. As a physicist I also have to tackle equations which you don’t have to deal with.

My own requirements are not quite the same, but I work in a world where the a priori categorization of the data I collect would be a librarian’s nightmare to sort out, and I have no time to be a librarian. And virtually none of it comes with helpful metadata attached.
The tool I use is Google desktop. Admittedly prosaic, but free, and a saver of untold hours of searching for the things I need to use. All I need to remember is some word or words more or less unique to the object. Put those words in the search box, and bob’s my uncle. YMMV

Ten years ago I transcribed excepts from the hundreds of books and papers while researching patterns appearing in evolutionary biology, anthropology and neuropsychology. I then posted the excerpts on the web (//www.sexualselection.org) for easy browsing when writing.

Now, I store the content, abstracts and excerpts, in a custom online database searchable by a number of criteria for easy access to information. The database structure provides hints of anomalies that transcend disciplines, patterns the connect non obviously related concepts.

I’ll explore Papers and Zotero to see if this makes my life easier. I wasn’t aware of these new options.

Might I suggest that you simply rename the PDF files? I have a folder on my computer (actually a whole tree of folders) where I keep about a thousand PDF files of papers and such. The names are things like “Milnor – Periodic Orbits, External Rays, and the Mandelbrot Set – An Expository Account.pdf”. I learned early on that when the Save dialog box opens in my web browser, I should always decide immediately what to call the file and where to put it. As you discovered, saving PDF files with names like “456330a.pdf” is the digital equivalent of keeping all your papers in a huge pile on top of your desk!

I will try to put this delicately. Lack of public access to journal pulications , when almost all of the research and support represented by those articles is significantly indebted to tax payer dollars, is simply disgusting. If the New Yorker can archive, index, and provide a search engine for everything it has ever published, so can any journal. Your rather naieve efforts to “organize” your “on line” research material, would make any relational database designer shiver in his shoes. Perhaps that’s your particular genius.

1) When you download pdf files, you can “save as” the title that you wish (goodbye strings of letters and numbers) and save in a file that you can also name.

2) You don’t need a program to search PubMed – it works as well as google all by itself. (try some tricks like adding “review” to your search words).

3) When looking for a manuscript you have downloaded, use Google Desktop – entering “pdf” and some relatively unique word or combination of words that may have been in the text – e.g. “deformed sperm”. Never underestimate the power of Google Desktop – the trick is in remembering the unique or obscure term characteristic only of that article.

4) Don’t abandon entirely what worked for you seven years ago. Having downloaded manuscripts, you can still print them, read them on the subway (or on a hammock, etc.), and scrawl your comments all over them – there is something about this process that really etches the manuscripts in your memory, and will have you (albeit passively) mulling over its information and how it fits into your thesis and data. Create a “draft” file and transcribe some of your thoughts with a quick reference.

You are completely underselling Zotero. It replaces completely Endnote and similar programs (it allows very easily the creating of bibliographies in different formats) and not only will store PDF files, but also podcasts, journal articles, newspaper articles, book references, blog posts (as this one), video recordings, etc. It’s an amazingly powerful piece of software that works in any operating system for which Firefox is available (most). In addition, very soon will have online storage available so you will be able to access your Zotero library from any computer.

It does have bugs, but all-in-all is one of the most amazing pieces of software I’ve seen in a long time (and it is FREE!)…

This sounds great, but I’m confused as to how this is any different from EndNote? Also, does this program format the works cited section of a paper that you are writing like EndNote does? That is something I need to do often.

I agree that it’s highly problematic that the general public is being walled off from source materials in science. I find catalogues online, but access to the articles is expensive. I think one solution for this is for public lending libraries to have access to these materials. It should be a federal program that local library systems can share. I suppose the other solution would be for Google to just copy it all and put it out there. Bouyah!

How timely. I’m learning EndNote as I’m finishing 2 overdue papers that are getting in the way of finishing my thesis. I wish I had learned EndNote earlier in the process. Olivia mentioned all the same situations I’m having (unknown pdf files downloaded from libraries, boxes of papers organizing something so you can find it again …).

Olivia, how do you handle writers block, time management, deadlines, etc. Do you have piles of unfinished articles and papers that you gave up on because you lost interest or you didn’t like what came out in the first pass? Do you start writing before you have all the information you need?

This article was very interesting. I hope you give us more about the writing process.

How much scientific information gets lost because writers had to meet some deadline so they cut the article short?

Dear Dr. Judson,
I agree that the sequestration of scientific journal articles from the eyes of the general public is against the spirit of the scientific enterprise. I must say, however, that this problem never arose for me until the advent of the Internet, and I was confronted by gatekeepers such as JSTOR. (“No, no, no! You can’t see the Wizard today!”) I had never actually had occasion to pursue the information in scientific journals, having been an English major; however, an interest in scientific matters made pursuit of such things easy on the Internet, only to have the door effectively slammed in my face. I have many friends who are academics, and they have offered to help me with this, but I hate to pester them. Obviously, you are a very busy person, but I am delighted that you lend your voice to this issue, and I wish you could persuade your colleagues that the rest of us are not likely to be dangerous with a little more knowledge.
Eric Dobbs
Richmond, Virginia

The search engine on the newer Macintosh operating systems has been quite useful for me on its own. Its search engine can find keywords (or author names, dates, etc.) in pdf documents–as long as they are built as documents with text, not as pictures, but that’s usually the case, and I suspect papers would have the same issue. In addition, I immediately retitle downloaded files with keywords (ending with author), which solves the picture-pdf problem, since the macintosh search engine searches document titles as well as contents. It also means it’s easy for me to find papers that focus on a subject by looking in my “articles” folder, because they will often start with the appropriate keyword.

Olivia, you explain the real world like noone else. Please continue to do so. Your insights into the world of research are interesting too but not as much. Please give stuff like this to David Pogue. Tell us more about what you know about the way the world really works.

First, let me express my heartfelt thanks. I just downloaded zotero and from the looks of it, it is going to be immensely helpful. I only wish I had this program when I started my research five years back, soon after my retirement.
Secondly, a special word of appreciation for noting the additional burden that exclusive online information/research content places on an interested amateur/retired professional. Luckily, I happen to live within an hour’s driving distance of a major west coast University and while I am not allowed online access from outside the campus, I am allowed to download and print the document while being physically present in the library but not to record it electronically or transmit it to my email address. Such arcane rules. I am sure many lawyers were involved.

For less sophisticated needs I have used the Macintosh Stickies program to file papers, anything that comes electronically onto my screen. Very simple.:
Select the displayed info (mostly by selecting all)
copy the info
Activate Stickies
with the pointer in the yellow stickies, Paste
now you can manipulate the info (throw out parts)
In the left upper corner of the stickies template you click for Save.
Now you have some choices including saving pictures and deciding into what folder to save the content.
I have now 20 folders and the system works like a charm.

Very useful piece. I’m going to try papers.
The Connotea from Nature Group of Journals (www.connotea.org) has been around for some time, which lets you bookmark and tag papers, references as you are browsing. All the references or libraries you have gathered can then be downloaded to EndNote/RefMan database. But, it does not handle pdfs!!

Seems to me you can get the same functionality from Endnote. You can download files to a specified folder (still should rename them, just to be on the safe side) and the bibliographic info directly to endnote. You can then link the endnote entry to the pdf file on your computer.
I do this manually right now but the system might be automated, would have to ask Endnote people.

You can also add your notes directly within Endnote.

I’ve even managed to mirror the system so it doesn’t matter which computer I use (my laptop or desktop), I always see the same endnote library, linked to all the appropriate pdf files.

This problem sounds very much like one I had a few months ago. Though I don’t exactly perform copious amounts of academic research I was having trouble keeping track of all the information flying at me from every corner of the web on every topic imaginable. Though it doesn’t work under Linux very happily (I’m a Linux guy…), it functions well enough under Wine (A way to run Windows apps on Linux) and appears to have a very well implemented Mac OSX version.

I speak… of Evernote (//evernote.com). It will keep track of web pages, pictures, snippets of text, you name it. And makes it all taggable and linkable. It even has a web interface and a way to sync with other instances of Evernote so that you can access your information on the go. Well, anyway, enough of a commercial. It solved my problems… maybe it could help solve some problems for someone reading this comment.

What's Next

Olivia Judson, an evolutionary biologist, writes every Wednesday about the influence of science and biology on modern life. She is the author of “Dr. Tatiana’s Sex Advice to All Creation: The Definitive Guide to the Evolutionary Biology of Sex.” Ms. Judson has been a reporter for The Economist and has written for a number of other publications, including Nature, The Financial Times, The Atlantic and Natural History. She is a research fellow in biology at Imperial College London.