Indispensable Applications

by Kieran Healy on December 11, 2004

Picking up on an old item over at 43 Folders (this post has been marinading for a while), here’s a discussion of the applications and tools I use to get work done. I do get work done, sometimes. Honestly.

I’ll give you two lists. The first contains examples of software I find really useful, but which doesn’t directly contribute to the work I’m supposed to be doing. (Some of it actively detracts from it, alas.) The second list is comprised of the applications I use to do what I’m paid for, and it might possibly interest graduate students in departments like mine. If you just care about the latter list, then a discussion about choosing workflow applications [pdf] might also be of interest. (That note overlaps with this post: it doesn’t contain the first list, but adds some examples to the second.) If you don’t care about any of this, well, just move along quietly.

Why this matters

You can do productive, maintainable and reproducible work with all kinds of different software set-ups. This is the main reason I don’t go around encouraging everyone to convert to the group of applications I myself use. (My rule is that I don’t try to persuade anyone to switch if I can’t commit to offering them technical support during and after their move.) So this discussion is not geared toward convincing you there is One True Way to do your work. I do think, however, that if you’re in the early phase of your career as a graduate student in, say, Sociology or Political Science, you should give some thought to how you’re going to organize and manage your work. This is so for two reasons. First, the transition to graduate school is a good time to make a switch in your software platform. Early on, there’s less inertia and cost associated with switching things around than there will be later. Second, in the social sciences, text and data management skills are usually not taught explicitly. This means that you may end up adopting the practices of your advisor or mentor, continue to use what you’re taught in your methods classes, or just copy whatever your peers are using. Following any one of these paths may lead you to an arrangement that you’re happy with. But not all solutions are equally useful or powerful, and you can find yourself locked-in to a less-than-ideal setup quite quickly.

Although I’m going to describe some specific applications, in the end it’s not really about the software. For any kind of formal data analysis that leads to a scholarly paper, however you do it, there are basic principles that you’ll want to adhere to. The main one, for example, is never do anything interactively. Always write it down as a piece of code or an explicit procedure instead. That way, you leave the beginnings of an audit trail and document your own work to save your future self six months down the line from hours spent wondering what the hell it was you thought you were doing. A second principle is that a file or folder should always be able to tell you what it is—i.e., you’ll need some method for organizing and documenting papers, code, datasets, output files or whatever it is you’re working with. A third principle is that repetitive and error-prone processes should be automated as much as possible. This makes it easier to check for mistakes. Rather than copying and pasting code over and over to do basically the same thing to different parts of your data, write a general function that can be called whenever it’s needed. This idea applies even when there’s no data analysis. It pays to have some system to automatically generate and format the bibliography in a paper, for example. There are many ways of implementing these principles. You could use Microsoft Word, Endnote and SPSS. Or Textpad and Stata. Or a pile of legal pads, a calculator, a pair of scissors and a box of file folders. It’s the principles that matter. But software applications are not all created equal, and some make it easier than others to do the Right Thing. For instance, it is possible to produce well-structured, easily-maintainable documents using Microsoft Word, but you have to use its styling and outlining features strictly and responsibly. Most people don’t bother to do this. So it’s probably a good idea to invest some time learning about the alternatives, especially if they are free or very cheap to try.

Day-to-Day

These are applications that I use routinely but fall outside the core “Workflow” category. A lot of other people use them too, because they’re good (or the best) tools for everyday jobs. All of them are Mac OS X applications.

Quicksilver. A fantastic application launcher, file-finder, task-executer and other-stuff-doer. It took about two days for it to become the natural way for me to carry out all kinds of tasks. Quicksilver gives you automatic keyboard shortcuts for most of the entities on your hard drive (files, folders, applications, addresses, music tracks and playlists, bookmarks, etc), and then lets you perform (and chain together) many different sorts of actions on those entities: find files or email addresses, launch applications, attach files to email, find addresses or phone numbers, play music, append text to files, and lots of other stuff, too. To paraphrase a post I forgot to bookmark, Quicksilver is the kind of application that you get used to using immediately and, pretty soon, any computer you sit in front of that doesn’t have it installed seems broken. It’s free. Read more about it.

Safari. Yer basic Apple browser. Works great. Apart from not using Explorer, I never could get into the Browser Wars.

Ecto. The best way to be one of the bloggers. Manages my posts to this blog. May face competition in future from Mars Edit.

CalendarClock. Replaces your system clock and, as well as showing you the time, lets you see your iCal calendar, appointments and to-dos in a handy drop-down menu. Very handy. I have an older, free version but now there’s an updated commercial version.

Mail.app. I’m sure my email should be more organized and I should have all kinds of filters in place and all the rest of it, but Apple’s bundled application does what I (think I) need.

Terminal.app Mac OS’s built in terminal is just the thing for when you want to use the unix command line. It. Just. Needs. Tabs.

Workflow Essentials

These applications form the core of my own work environment—i.e., the things I need (besides ideas, data and sharp kick) to write papers. Papers will generally contain text, the results of data analysis (in Tables or Figures) and the scholarly apparatus of notes and references. I want to be able to easily edit text, analyze data and minimize error along the way. I like to do this without switching in and out of different applications. All of these applications are freely available for Mac OS X, Windows, and Linux (and other more esoteric platforms, too).

Edit Text.

Emacs. A text editor, in the same way the Blue Whale is “a mammal.” The Mac version is still a tiny bit flaky, but almost everything else in this list works best inside Emacs. I use Enrico Franconi’s Enhanced Carbon Emacs, which comes with some of the bells and whistles described below. There’s also a version available from Mindlube. Emacs is very powerful, and free. Combining Emacs with some other applications and add-ons allows me to integrate writing and data-analysis effectively.

LaTeX. A document processing and typesetting system. Produces beautiful documents from marked-up text files. Very powerful, and free. Available in convenient form for Mac OS X via Gerben Wierda’s i-Installer. If you want to try it out, but don’t want to learn Emacs, download TeXShop and use that as your editor instead.

AUCTeX. Enhances Emacs no end for use with LaTeX. Makes it easy to mark-up, process and preview LaTeX documents. AUCTeX is part of Emacs, though not always in its most recent version. If you’re a Mac user, it’s worth getting the most up-to-date version of AUCTeX because you can configure its “LaTeX this” command to produce a PDF file by default.

RefTeX. Enhances AUCTeX to help you outline documents more easily, and manage references to Figures, Tables and bibliographic citations in the text. Both AUCTeX and RefTeX could also be under the “Minimize Error” section below, because they automagically ensure that, e.g., your references and bibliography will be complete and consistent.

ESS. Emacs Speaks Statistics. An Emacs package that allows you to edit R files and run R sessions inside of Emacs. Does syntax highlighting and other things as well, to make your code easier to read. ESS is free software.

Minimize Error.

Sweave. A literate programming framework for mixing text and R code in way that allows you to reliably document and reproduce your data analysis within a LaTeX file. In the ordinary way of doing things, you have the code for your data analysis in one file, the output it produces in another, and the text of your paper in a third file.[1] You do the analysis, collect the output and copy the relevant results (often reformatting them) into your paper. Each transition introduces the opportunity for error. It also makes it harder to reproduce your own work later. Almost everyone who has written a paper has been confronted with the problem of reading an old draft containing results that you want to revisit, but can’t quite remember how you produced them. With Sweave, you just have one file. You write the text of your paper (or, more often, your report of a data analysis) as normal, in LaTeX markup. When the time comes to do some data analysis, produce a table or display a figure, you write a block of R code to produce the output you want right into the paper. Then you ‘weave’ the file: R processes it, replaces the code with the output it produces, and spits out a finished LaTeX file that you can then turn into a PDF. An example will make this easier to understand. It’s pretty straightforward in practice. The only downside to the Sweave work model is that when you make changes, you have to reprocess the all of the code to reproduce the final LaTeX file. If your analysis is computationally expensive this can take up time. There are ad hoc ways around this (selectively processing code chunks, for instance) that may eventually appear as features in a new version of Sweave. Sweave comes built-in to R.

RCS. A Revision Control System. Allows you to keep a complete record of changes to a file, creating a tree of versions as you make changes. This allows you to revisit earlier versions of papers and data analyses without having to keep directories full of files with names like Paper-1.tex, Paper-2.tex, Paper-3-a-i.tex, and so on. RCS is the oldest of the revision control managers directly supported by Emacs. CVS is a newer version that supports multiple authors, and Subversion is newer again. I haven’t used these: Subversion looks interesting, but integration with Emacs’ version control menu isn’t quite there yet. RCS is free.

Unison. I have a laptop and a desktop. I want to keep certain folders in both home directories synchronized. Unison is an efficient command-line synchronization tool that can work locally or use SSH for remote clients. There’s also a GUI version. Unison is free. Many other file synchronization tools are available for Mac OS X, but I haven’t used them.

Pros and Cons

From my point of view, the Workflow applications I use have three main advantages. First, they’re free and open. Second, they deliberately implement “best practices” in their default configurations. Writing documents in LaTeX markup encourages you to produce papers with a clear structure, and the output itself is of very high quality aesthetically. By contrast, there are strong arguments to the effect that, unless you’re very careful, word processors are stupid and inefficient] Similarly, by default R implements modern statistical methods in a high-quality way that discourages you from thinking in terms of canned solutions. It also produces figures that accord with accepted standards of efficient and effective information design. (There’s no chartjunk.) And third, the applications are well-integrated. Everything works inside Emacs, and all of them talk to or can take advantage of the others. R can output LaTeX tables, for instance, even if you don’t use Sweave.

At the same time, I certainly didn’t start out using all of them all at once. Some have fairly steep learning curves. There are a number of possible routes in to the applications. You could try LaTeX first, using any editor. (A number of good ones are available for Mac OS and Windows.) Or you could try Emacs and LaTeX together first. You could begin using R and its GUI, and never mind about the text editing. Sweave can be left till last, though I’ve found it increasingly useful since I’ve started using it, and wish that all of my old data directories were documented in this format.

A disadvantage of the particular applications I use is that I’m in a minority with respect to other people in my field. Most people use Microsoft Word to write papers, and if you’re collaborating with people (people you can’t boss around, I mean) this can be an issue. Similarly, journals and presses in my field generally don’t accept material marked up in LaTeX. Converting files to Word can be a pain (the easiest way is to do it by converting your LaTeX file to HTML first) but I’ve found the day-to-day benefits outweigh the network externalities. Your mileage, as they say, may vary.

A Broader Perspective

It would be nice if all you needed to do your work was a bunch of well-written and very useful applications. But of course its a bit more complicated than that. In order to get to the point where you can write a paper, you need to be organized enough to have collected some data, read the right literature and, most importantly, be asking an interesting question. No amount of software is going to solve those problems for you. Believe me, I speak from experience. The besetting vice of an interest in productivity-enhancing applications is the temptation to waste a tremendous amount of time installing productivity-enhancing applications. The work-related material on my computer tends to be a lot better organized than my approach to generating new ideas and managing the projects that come out of them—and of course those are what matter in the end. The process of idea generation and project management can be run efficiently, too, but I’m not sure I’m the person to be telling people how to do it.

Notes

fn1. Actually, in the worst but quite common case, you use a menu-driven statistics package and do not record what you do, so all you have from the data analysis is the output.

fn2. I think that the increase in online writing and publishing has made Word Processors look even worse than they used to. If you want to produce text that can be easily presented as a standards-compliant Web page or a nicely-formatted PDF file, then it’s much easier to use a text editor and a “rendering pipeline” that supports a markup system like Textile or Markdown. But that’s a rant for another day.

Share this:

Having used some of the applications you recommend (Emacs, LaTeX), I appreciate the suggestions, but realistically speaking, the learning curve really can be quite steep.

I’ve been meaning to post about Firefox, but didn’t really feel like I had enough for a whole entry so here is the pointer. I like its search bar (like the Google bar, but with the option of switching to dictionaries, Amazon, IMDB, other search engines, etc.) and I like the Find bar. But then again, I know from my research that very few people use CTRL-F (or Apple-F) to find words on Web pages so I don’t how many people will actually benefit from the Find bar. Nonetheless, for those of us who use it, it’s nice to have.

I use State for stats, but do not use its incredibly limited do-file editor. Rather, for that I rely on UltraEdit ($35). I haven’t upgraded to the newest version so I don’t know about the most recent features. One feature I like a lot that was already present in several earlier versions is its find-in-files function. Of course, there are now more powerful services coming out to help look for content within files on one’s harddrive, but UltraEdit has had this feature for a while and it’s been very helpful.

You mentioned organizing folders and files so you know what’s in them. I have a separate folder for each paper I am working on regardless of the project to which it is tied. (For big projects I have separate project folders as well for the data sets, but that’s independent of the folders for papers.) Each of my files has a date as part of its name and I resave with new dates occasionally to keep former versions.

I do use Styles in MSWord and they are super helpful. You’re probably right that few people take advantage of those features. I can’t imagine using the program without them.

For bibliographic info I use EndNote. I am not super happy with the program, but I am very happy that I invested in it a few years ago. (If you can afford it, the best method is to hire an RA to input all the references you’ve accumulated.) These days, as soon as I see a reference I think I might want to cite at some point, I add it to my one gigantic all-encompassing EndNote file. To navigate the contents of the big file, I add keywords to entries.

I also just wanted to second the point about documenting all of your actions especially with statistical work. To an outsider, I’m sure the comment fields in my do-files look like complete overkill. But one really does revisit data not only months but several years later and it’s good to know why you did what you did. If co-authors come along, comment fields are also helpful to get them up to speed. And yes, I keep copies of the original data sets always. I then generate a new data set after I run the do-files.

I’ll hold off on posting about the graphics/photo-related programs I use. Although I definitely need some of those for work at times, that’s getting into a somewhat different territory.

I’ll repeat my recommendation of Bookends, as bibliographic software – it produces BibTeX output, though I’ve never got on top of this.

My big need is for a successor to Scientific Word, that is a proper front end for LaTeX, that actually works like a WYSIWYG Word Processor. I don’t think TeXShop does this, and I’ve been stuck with a long-orphaned version of SW 2.5 for Mac Classic. Any help on this would be hugely appreciated.

For general WP, the best substitute for Word under OS 9 was Nisus Writer. The transition to OS X has been very painful, but the latest version (Nisus Writer Express 2.1.1) is a usable alternative to Word, though missing a lot of the nice stuff from the Classic version.

Adobe Acrobat (full version) is, in my experience, a better way of getting from LaTeX output to Microsoft Word or RTF where you need to do this.

Firefox is a great browser, and as I mentioned in an earlier post, the ConQuery extension, enabling you to highlight text and then search in Amazon, Google, WikiPedia, Leo etc is a great boon.

But I’d also list the Memo pad on my Treo as a essential. I can paste in text and information (such a library classmarks!) on my desktop and then retrieve them in an archive… a physical one, I mean. And I can also use the process the other way to get text entered on my handheld to my pc. Works for all kinds of stuff: directions, recipes, you name it.

Thanks for this Kieran: I am just in the initial stages of retooling a working environment from programming/mathematics (my day job) to social science (as an Open University student) and there are few things more pleasant that being told to leave well enough alone.

I would add, though, that on Suse Linux 9.1 (which I’m currently using)

* Emacs doesn’t come with AUCTex. I think their XEmacs does, but I don’t use XEmacs. I’ve been doing without it,

* The Latex distribution (tetex) doesn’t come with the Harvard-style citations package.
I spent a jolly hour or two fixing that, for sure.

I think the conclusion is (as usual) that I want a Mac. Unix and shinies, isn’t it?

You wrote “I do get work done, sometimes. Honestly.” As you see, this blog gives me difficulty getting punctuation over (no WYSIWYG), so I’m having to guess just what you meant by it.

Precisely how did you intend to punctuate that? Was it supposed to be a single sentence, perhaps?

This is by way of introduction to JQ’s queries about WYSIWYG. JQ, I suspect you may need to find some wrapper scripts to go on top of a very unadorned word processor. That would work best by consulting with people commonly working with the same subject matter (not necessarily your equals in the field), not just to draw on their libraries of macros but to help evolve them. If you can’t plug into a living group like that, try starting with something like wikipedia and remember to compensate for any self-selected nerd bias in its contributors’ discussions of WP functionality (you’ll benefit more from the “talk” pages than from the objective material on WP packages themselves).

bookends
is another citation and note taking tool that i use instead of endnote. i like it because it lets me save the quotations that i want to use with the citation, search everything as opentext and automatically adds books and autofills appropriately for oft cited journals.

bbedit is my editor of choice, it plugs into cvs….

word and or mellel for writing, both use rtf.

grammarian for grammar and spell checking anywhere
netnewswire marsedit and wordpress for blogs
mediawiki for webbed note storage
terminal for many things

voodoopad for making conceptual messes
tinderbox for just about anything(though i really need to learn how to use it better)

———
workflow: i use .mac synced to 2 laptops and my office machine. in there i have a folder ‘current writings’ where i keep what i’m writing now, and have worked on in the last few years. Not in there is my papers folder which i back up on a dvd every now and again. currently it holds around 2000 pdfs, which i can search using terms. the pdfs are a combination of journal articles, and anything that i found interesting at the moment and printed to pdf. with 10.4, this ‘personal research archive’ will autocategorize, but now, what i do is search and drop them into folders for projects, as i need them then drop them back into the general archive later.

I would suggest also heading over the engineering school to take a class in relational database theory, then to a corporate training vendor for a class in how to use relational databases in practice. Working through “SQL for Smarties” in Oracle or PostgrSQL would help drive these points home.

SAS, which you have probably used, is a great tool, but most people learn to use it without understanding the theory behind it, which leads to some sub-optimal results.

First, the transition to graduate school is a good time to make a switch in your software platform.

Agreed. After MS Word did atrocious things to the formatting of my masters’ thesis, I decided to ditch it and learn LaTeX for my doctorate. It was a pretty steep learning curve, but by doing it in the early months of doctoral research, I managed to crack it, and in doing so, managed to make life a heck of a lot easier in the final months of getting everything proofed and printed.

For other writing, I’m a fan of Nisus Writer Express, and especially drawn to Ulysses. Both are far superior to MS Word: the former because of its simple interface, inline wordcount and dictionary/thesarus; the latter because it strips away WYSIWYG and lets you concentrate on the words on the screen.

As an Eng Lit type, I’ve always despaired of traditional citation management software (EndNote etc), which is fine for storing lists of citations with abstracts, but not for extensive quotations with page/line references from the works cited. That’s a vital addition for me, and I’ve yet to find a tool that’s quite up to the job.

I second the recommendation for emacs and LaTeX, though I don’t have experience with the other software. I’m a physicsist, and the number one app that I want is a simple photo markup program. Just the ability to add arrows with numbers on them to pictures would be great.

In the longer term I’d like to have a dynamic lab notebook which automagically downloads pictures from my cell phone and allows me to mark them up and include text with change tracking. Something along the lines of a Wiki would be ideal. That way I can shoot snaps of equipment as I’m building it or installing it and add comments to keep track of what I did when. The notebook should also include the ability to link to data files and data processing scripts, again with change tracking. One other necessary element would be automatic backup to a remote RAID server. It seems to me that all the elements exist but nobody has yet integrated them into a single seamless app.

Plain old paper lab notebooks are great, but you can’t beat a picture for certain kinds of information. Done right the Sooper Dooper Electronic Lab Notebook could seamlessly integrate the nitty-gritty of lab work with data analysis and publication.

That said, simple photo markup would be a huge step in the right direction. I’ve tried doing it with photoshop, but it’s too big to just leave idling in the background while I do other things.

There is a more pleasant alternative to Emacs available: Alpha. The original Alpha was coded in a mixture of C and Tcl for Classic Mac OS, but now there is the cross-platform AlphaTk (coded entirely in Tcl/Tk, for all platforms that support Tcl/Tk, including Windows and Mac OS X); in addition, the original Alpha project has migrated to OS X as AlphaX.

The LaTeX mode in Alpha is very good, and the whole editor is very user-friendly (and it even supports many Emacs key combinations).

What I wouldn’t give for an editor with the adaptability of emacs, but without the creeping kudzu of emacs. And with key combinations which aren’t stupid. Every time I’ve tried to learn emacs, I find myself completely unable to get past the sheer dumbness, from a usability point of view, of C-x C-f.

I’d love to get shot of MS Word and move to a TeX-based writing environment, but the lack of a decent editor prevents me. (Vi sucks for this purpose in a whole different way.)

I know I am in a minority of one on this subject, but I will go to my grave protesting that Latex documents look shit. For some reason they’re always in this yacky skinny serif font and the linespacing always looks too wide. I’ve seen plenty of them with really dodgy kerning and justification too.

I’ve been using Tinderbox as a note-taking/information management tool for the last three months or so. I love it, though feel that I’ve only scratched the surface of its usefulness (my feeling about it is similar to the way Kieran describes Quicksilver, which I downloaded weeks ago, but have yet to try). Right now Tinderbox is Mac-only, though they’re working on porting it to Windows.

That’s Computer Modern, the default Latex font. Standard installations come with three or four more (Times and Palatino, in particular), if you don’t like it (and many don’t, especially for online PDFs). If you want other ones you have to buy them. I forked out for Adobe’s versions of Sabon and Caslon, and I use those (especially Sabon) for my papers these days.

Easy installation of new fonts is one area where LaTeX falls down in a big way, though things have improved considerably in recent years.

You’ll get a lot more of an argument from me and others about the quality of kerning and justification in Latex, as opposed to things like MS Word.

I think the default TeX font is real purty, especially the capital Qs.

I bet it would be pretty easy, though maybe not so elegant, to hack together a quotation-management system using textfiles and swish-e. You’d have to use some sort of standard format for the files (überdorks would use XML, but that’s basically human-unwriteable) so that you could programmatically extract author, work, page, etc (keywords, I guess), then use the prog input type for swish-e, your program feeding the metadata to swish-e’s engine using the MetaNames or PropertyNames (or something, it’s been a while since I looked at swish-e last) features. Then you could search doing something like “swish-e -w author='(someone or someonelse)'”. If that’s actually what you mean by quotation management. I just rely on my PRODIGIOUS MEMORY.

I’ve seen plenty of them with really dodgy kerning and justification too.

I really, really doubt this. Kerning can go wrong in LaTeX for basically two reasons: Incorrect font metrics, which is not a problem with LaTeX, but with the font, and end-users trying to insert manual spacing adjustments as if they were still working in a word processor, which is also not a problem with LaTeX.

Vim is a nice alternative to emacs for those who don’t want to develop emacs pinky. I usually use LaTeX and vim, but I know people who use LyX as their interface to LaTeX on UNIX and TeXnicCenter on Windows (it’s better than WinEDT and completely free.)

Revision control is a necessity if you care about your work, and I’ve recently converted from CVS to subversion (with the fsfs repository, not the BDB one.) Fixing a set of mistakes by typing “svn revert” just once makes the relatively minor effort of learning svn completely worthwhile.

Firefox for browsing, Thunderbird for email, and Sharpreader for reading RSS feeds on Windows. I haven’t found a UNIX RSS aggregator that I’m completely happy with yet. Thunderbird might be good, but it doesn’t import or export OPML files yet. I was surprised to see a discussion on a blog about software tools without a mention of RSS aggregators. I wouldn’t have the time to follow blogs without one.

I use python (with numarray) and Mathematica for most mathematics and programming work; I use perl for munging text, and I use bash for managing files. While almost everything leaves Windows Explorer in the dust and Konqueror is both pretty and powerful, nothing compares to a good shell like bash for ease of use and ability to automate.

Part of the learning curve was ensuring that my own stuff didn’t look like shit. Adobe Caslon Pro, thank you very much, with old-style figures and proper spacing. The default styles are rather crappy, to my eyes. But ‘\usepackage{palatino}’ (or even the ‘times’ package) works wonders as a starting point.

As for quotation management: something that hooks into BibTeX and its UIDs for texts, while offering a free-text search mapped to page references, would do me just fine.