Robust workflow for replicability and reliability with Eclipse +
StatET

SPSS
and Office are easy to learn but lack the power and extensibility of R,
SQL,
LaTeX/Sweave, BibTeX, and Subversion (SVN). These open-source
technologies were
developed based on peer-reviewed code and are designed to facilitate
replicability, but they do not just work 'out of the box.' This
page contains instructions for how to configure these technologies and
set up
a workflow for data management, analysis, and
word-processing/typsetting using these tools. Though I pay
special attention to Mac OS, PC/Linux users should be able to implement
everything below.

One common complaint about R is that it is RAM-hungry. This is
true, but the user can easily use a SQL database such as SQLite
with R using the SQLiteDF package to augment its data processing
capabilities. Another solid option is R package Bigmemory, which
is designed to help users analyze datasets > 10 gigabytes. See the Bigmemory vignette for details.

LaTeX (pronounced 'Lay-Tek') for beautiful typsetting

Eclipse, the editor (for just about everything)

Eclipse is a cross-platform open-source editor based on Java and
originally developed by IBM in 2001. It provides provides an
integrated development environment (IDE) meaning that it provides a
source code editor, compiler or (more critically for our proposes) an
interpreter, build automation tools, and a debugger (though debugging
must still be done in R and LaTeX via the command line). Eclipse
also features an an extensible plug-in system. Among other
things, it was designed with visual programming in mind, meaning that
it assists programming tasks by representing code elements visually,
and sometimes providing the capability to allow programmers manipulate
program elements graphically instead of merely via text (code).

Eclipse is one of the most widely used editors/IDEs in existance. One
survey
suggests that Eclipse is the third most-used IDE in existence (second
to MS-Visual Studio and Adobe Macromedia Studio). It easily the
most widely used open source IDE, and some predict that because of its
rapid growth and development it will eventually rise to the number-one
spot. What this means is that the codebase is robust and
stable.

Why not just use the default R editor? Well, in addition to
Sweave functionality, Eclipse provides a connection to R with shortcut
keys, R & Sweave syntax highlighting, hover functionality, an
outstanding graphical object browser, an outline that links to new
objects
and functions declared in your code, and it almost never crashes,
so when (not if) R crashes, you don't lose your R code. It also
has find and replace function with RegEx support, toggle code
commenting (command-shift-C or control-shift-C), content assist, and
other very nice features. Though it is a resource-intensive IDE,
it is easier to learn than
Emacs or Aquamacs.

Install the latest version of Eclipse, which we will also use as an
R/LaTeX/Sweave editor:http://www.eclipse.org/downloads/
(you can download any version, I use the Eclipse IDE for Java
Developers).

"Local history": Robust local version control within Eclipse

One thing I recommend immediately upon downloading Eclipse is to set
its local history size to unlimited. This means that Eclipse tracks
your changes to each file in your project, each time you save it.
Go to Preferences => General => Workspace => Local History and
un-check "Limit history size."

If you need to compare your current document to a previous document,
simply right-click (or Ctrl-click) on the file and select "Compare
with" => "Local history..." Each saved revision to the
document will then appear in a side window. Double click on each
one to compare the differences between the current and saved
document. This works great when you accidentally save/write a
file over the one you previously were using (i.e., for your
dissertation/journal manuscript), because you can simply use compare
with local history to restore all of your previous hard work.

Also, if you deleted a file by accident, you can right click on the
project folder and select "Restore from Local History..." and restore
your deleted file.

Once you install StatET, you'll want to run the cheatsheets to
optimize your Eclipse configuration. From Eclipse, go to Help =>
Cheat
Sheets... and click on the StatET folder. Run the cheat sheets in
order.

Basics and Shortcuts

Now that you've run through all of the cheat sheets, StatET should be
the default perspective and the R console should run
automatically. You might want to start by making a new R
project. If you want to import an existing project, just make a
new project with the same folder name, and copy the folder into your
workspace.

You can send R code that you write to the R console by pressing 'command+R, command+R' - that's
command+R twice.

A full list of shortcut keys is available by clicking 'command+shift+L, command+shift+L'.
I recommend changing the assignment (' <- ') shortcut to
'command + shift + , ' or something similar and changing the add docu
comment (' ## ') shortcut to something like 'command + / '. [The
default shortcut combinations seem not to map to any existing keys on
my keyboard...].

One more useful shortcut key: content assist: 'ctrl+SPACE'

Writing LaTeX documents in Eclipse with Texlispe

StatET also installs Texlipse, which is rather nice way to compose
latex documents, especially if you are collaborating with others.
You'll need to set up a latex project first, which can be done via
File, New.... Each time you save the .tex file in the project,
Eclipse will compile the latex file for you. If you want to
change the name of the output file, right-click on the project folder,
select Properties, select Latex Project Properties from the menu on the
left.

One you have a latex project going, you'll want to tweak some of the
settings to get the most out of Ecilpse. Here's how:

Let's take care of spelling first.
ENABLE SPELL CHECKING for LaTeX:
Open Eclipse Preferences, select General, Editors, Text Editors,
Spelling,
and make sure spelling is enabled. You'll also need to specifiy a
dictionary. I use this one:

You can just download this dictionary into some directory (e.g., the
main Eclipse directory), then point to it in the "User defined
dictionary" dialogue box.

Now from the Preferences window, go to Texlipse, Spell Checker, and
make sure built-in spell-checker is selected.

You can now press 'Command + 1'
( or 'Ctrl+1' on a PC) on a word with a squiggly red line under it to
pull up a menu to either correct the word or add it to your dictionary.

Now LINE-WRAPPING. You can either hard-wrap your lines or soft
wrap them. For hard-wraping, which I recommend if you are
collaborating with others (it will be easier to see edits when you are
comparing changes in svn), from Preferences go to Texlipse, Editor and
select 'Use hard wrapping.' Eclipse will insert a return
character when reach the number of characters you specify in the
'Number of characters in line (10-1000) dialogue box.

If you edit something and the line wrapping gets screwed up, just press
Esc Q or select Latex >
Correct Line Wrap, and eclipse will make everything pretty for you
again.

For soft-wrapping, I recommend the Ahtik plug in.
In Eclipse, go to help ==> Install New Software... and use the
following as the url in the "work with:" prompt:

http://ahtik.com/eclipse-update/

Easy TABLE CREATION is facilitated by Eclipse's 'LaTeX Table
View.' Make sure it is visible first. Click on Window, Open
Perspective, and select Other. Select 'LaTeX' from the menu and
hit OK. If you do not see LaTeX Table View on some tab within
Eclipse, go to Window, Show View, and select LaTeX Table View.
You can now create tables in this spread-sheet-like editor, right-click
and select 'Export to clipboard,' and it will format your table with
&'s and nice spacing. If you want to edit a table you've
produced previously or in R (say via xtable()), right click on the
LaTeX Table View editor, and select 'Import selected lines from
editor.' When you are done just export to clipboard as previously
described.

Sweave & pgfSweave

Sweave
let's you do your data analysis and writing all in the same
place. I prefer pgfSweave,
which caches your analysis and graphics, so that you are not running
that 30 minute bayesian estimation proceedure each time you want to
make a small cosmetic change to your document and see the results in
pdf. It also matches the font in R
graphics to whatever you are working with in your document, and
provides R syntax highlighting for any R code in your
document - see
the pgfSweave vignette here for details. First install
pgfSweave in the R terminal:

install.packages("pgfSweave")

To get pgfSweave working in Eclipse, go to Run => External Tools
==> External Tools Configurations, and double-click on 'Sweave
Document Processing'. Where it says 'Run command in active R
Console:' replace the existing Sweave text with the following:

Now is also a good time to make sure that your Sweave/pgfSweave build
tools are set up correctly, by navigating to the LaTeX tab and clicking
on the blue underlined link that reads "Setup build tools..." If
the various tools have directories listed, you're golden. If not,
type the location of your TeX distribution where it reads "Bin
directory of TeX distribution:" which on a mac should be as
follows:

/usr/local/texlive/[current tex year]/bin/universal-darwin/

[e.g.] /usr/local/texlive/2010/bin/universal-darwin/

You may also need to add
this directory to your
system path (I had to). To do so (on OS 10.6 Snow Leopard) open
the terminal and type the following:

sudo pico /etc/paths

You can add this line via the pico editor.

You may also want to tell Mac's Finder to display all files. To do so
enter the following in a terminal window:

defaults write com.apple.Finder
AppleShowAllFiles YES

You have to restart finder for it to work. Hold option, click and hold
the finder icon and select relaunch. (Thanks to Sean Westwood for this
hint).

To use pgfsweave, you'll want to add the following lines to any Sweave
(.Rnw)
document:

\usepackage{Sweave}\usepackage{tikz}\usepackage{pgf}

THE SWEAVE.STY ISSUE - the most frustrating
Sweave issue
when you are just starting out - Sweave/LaTeX
for no good reason cannot find the 'Sweave.sty,' latex package (the
file is just in your ~/R/R-x.x.x/share/texmf directory). Do
yourself a favor and just download this copy of
Sweave.sty and put it in your TeX distribution folder or the current
directory you're working in:

You can also use R itself to call LaTeX such that you use R to generate
your pdf directly. You can do this via Run => External Tools
==> External Tools Configurations, and double-click on 'Sweave
Document Processing', click on the LaTeX tab, then select the option
for "Build tex file using the R command:". The R command should
be filled in for you.

You can do this by using the option 'Build tex file
using the R command' in the LaTeX tab of the Sweave
document processing profile

ENABLE SPELL CHECKING in Sweave/pgfSweave:
Open Eclipse Preferences, select StatET, Source Editors, Sweave Editor
and click on "Enable spell checking." Where it says "Note: On the
Spelling preference page..." click on Spelling. This will take
you to the main spelling preferences page. You'll need to specifiy a
dictionary. I use this dictionary:

You can just download this dictionary into some directory (e.g., the
main Eclipse directory), then point to it in the spelling preferences
page.

ENABLE LINE WRAPPING in Sweave/pgfSweave:
For now the easiest way to enable line wrapping is by installing the Ahtik plug in.
In Eclipse, go to help ==> Install New Software... and use the
following as the url in the "work with:" prompt:

http://ahtik.com/eclipse-update/

Examples - Sweave & pgfSweave

Often the best way to learn something like this is by looking at
examples. Below I've posted example code for a few projects in
Sweave and pgfSweave:

Subclipse (easy to use Subversion interface in Eclipse)

Subversion is version control
software, great for collaborating on large projects. In short, it
keeps track of every change made to a series of files and is generally
an excellent way to prevent data loss and track changes between
documents.

First you
need subversion 1.6 with java bindings if you want to use it with
Eclipse:

I recommend Subclipse, which seems to play well with StatET (a
colleague reported not being able to use some SVN plugin for Eclipse
with StatET installed). From eclipse, install new software, using
this as the repository:

To connect to an existing repository, go to File, New, other, and
expand the SVN options folder, then select 'Checkout Projects from
SVN.' Select 'Create a new respository' (this creates a new
respository setting locally, not on the server). Then enter the
svn url. If you connect via ssh, which is often required for
institutions, a typical svn url is formatted as follows:

svn+ssh://username@domain.name.edu/folders/svn/project

There are three main svn commands you'll need: synchronize, update,
commit. Generally, you'll want to sychronize your project before
starting work on it to make sure you have the most recent version, and
after working on it to make sure that your changes are saved to the
server and available to your collaborators. Right click on a
project, select Team then Synchronize with Repository.

Sychronize
will open a new view in Eclipse that will show the files that are
different on your local svn versus the server. You can
examine the specific differences between documents by right-clicking on
any document that has been changed and selecting 'Open in Compare
Editor'. If you are ok with the changes, right click on the
project and select Update.

Or, if you prefer, you can simply update your local version with the
version on the server without looking at the changes. To do so,
simply right click on the project and select Team, Update to
HEAD.

When you are finished working with a project, go back to the team menu
(right click on the project, select Team). You can either
Synchronize with Repository again, or you can just select commit.

BibTeX for bibliography management

BibTeX is LaTeX's bibliography manager. The idea is that you
provide something like:

According to
\citet{messing2009Bias}, the McCain campaign used increasingly
``photoshopped'' images of Obama in their ads as election day
approached.

And your pdf output looks like:

"According to Messing (2009), the McCain campaign used
increasingly "photoshopped" images of Obama in their ads as election
day approached."

And this reference is automatically inserted into your bibliography (in
the correct order and format):

Messing, S., Plaut, E., & Jabon, M. (2009). Bias in the Flesh:
Attack Ads in the 2008 Presidential Campaign. In Proceedings of the 2009 American Political
Science Association Annual Meeting.

Your references are all stored in a database and are programmatically
retrieved behind the scenes by LaTeX. To set BibTeX up with LaTeX
if
you use APA format, first download this .bst file:

Replace the paths to these files with the appropriate path on your
machine. Note that you do not need to give BibTeX extentions to
specifiy apaish.sty or MyLibrary.bib.

Zotero + BibTeX to take the pain out of your lit review

Zotero is Firefox plug-in that makes building a bibliography database
much less painful than it used to be. If you are browsing and see
an article you want to cite in a journal website, Google Scholar,
Amazon.com, etc., you simply hit a button in your brower's address bar and Zotero imports the
citation into your citation database. Here's what it looks like:

Note that you'll want to check citations each time you import them to
make sure they have all the relevant entries you need. I
recommend establishing a folder for each project you work on to keep
things straight.

You can also share bibliographies with your collaborators, by making a
group library. For example, the Stanford Comm
Department has a group library so
that we can collaboratively build on each other's citation
databases.

Though Zotero has a plug in for MS-Word and Open Office, LaTeX
documents look much nicer and are less fragile than MS-Word and Ooo
documents (I've had MS-Word drop all of my citations for reasons that I
cannot figure, and then had to manually re-enter them via the graphical
user interface).

One really nice thing about Zotero is that you can configure it so that
if you drag and drop a bibliography entry into your LaTeX document,
Zotero will automatically generate the citation for you. Go to
Zotero Preferences. Open the Advanced pane. Click on "Show
Data
Directory." This will take you to a "zotero" folder. The
"zotero"
folder will contain a "translators" folder. You should be in this
directory

While you're at it, download this BibTeX.js
file into the same directory, which will make sure that your
in-document LaTeX references (e.g., \citep{messing2009Bias} ) match
your bibliography key entries when you export the full
bibliography. [UPDATED FOR ZOTERO 2.11].

[ Or if you'd rather modify the javascript yourself or prefer a
different reference id naming scheme, you can open "BibTeX.js" in a
text editor like Notepad++ or Xcode. The line to change is:

var citeKeyFormat = "%a_%t_%y";

For example, I changed it to

var citeKeyFormat = "%a%y%t";

where %a is first author, %t is first word from title, %y is the year. ]

Then restart Zotero.

After you restart Zotero, set "BibTex CiteKey-Only Exporter" as the
Default Output Format in the Export preferences pane. Now you can
select a reference from Zotero and drag it off the screen into a
waiting text editor (e.g., Eclipse). Alternatively you can use
Cmd+Shift+C to copy the \citep{key} to your clipboard.

When you are ready to export your entire bibliography, right-click or
control-click on the folder and select "Export Selection." I recommend
downloading this .bib database file to the working directory for your
Sweave/LaTeX project.