Hey, Researcher! Open Source Tools For You!

Disclaimer: Your author is writing this article after
having just finished his latest task - conducting numerous experiments
which involved processing a lot of data, and finally writing the paper on
it in just under five days. He wonders how he would do it without some of
the tools he will be talking about in this article, leaving aside the
pleasure of working on Linux systems. His views are biased and are meant to
be taken cum grano salis, but I hope you will give him a fair
chance.

What am I addressing in this article?

My meeting with Linux happened accidentally 10 years ago, with the fun of
wiping out and putting in a Linux distribution numerous times on my first
computer without having any clue about how to work on it. Finally, after a
lot of such iterations I finally settled in, working on it full
time for the last 5 years. The accident that happened 10 years stood me in
good stead both when I was working at a software company and now when I'm
working in a research lab. In a research lab, we need to plot figures, run
batch jobs remotely, perform scientific and numerical computing, and last
but not least, publish results by writing papers. Just a few days before I
wrote to the Editor-in-Chief about this article idea, I helped a Windows
friend draw some plots (fixing them for publication) for her paper,
wondering all the while "Jeez, how do Windows users do it?" Most tools that
I used were primarily for Linux. On Linux, things are so much easier. You
can focus on the real job instead of bothering about setting up your tools.
This article is therefore an unabashed claim about the coolness of working
with these tools while you focus on your research. Some of these tools can
be set up on a non-Linux system too; however, I shall address their usage
on a Linux system only and make no attempt to cover any non-Linux issues.
The primary focus here is to introduce my readers to these tools, and not
to convert them to Linux. It would, in fact, be a good exercise to get
all these tools that I talk about here on Windows!

In this article, I shall take a look at (in no particular order) getting up
and on with LaTex, gnuplot, GNU octave, Python scientific
libraries, Beamer, and some other tools and libraries.

Hello Linux

Before I start talking about using the scientific tools, I shall write a
few lines about how can you set up a Linux system for yourself - if you
don’t already have one, that is. While bare-metal Linux installations
are the best case scenario, it may not always be possible because your
primary work may be on Windows. In such cases, the best solution is to go
for a virtual Linux installation. In my humble opinion, I have
found using VirtualBox to be a
breeze when you need to set up a virtual machine. Just download a Linux
distribution like Ubuntu and install it using VirtualBox. Once you have the
basic installation, take a little time to gain some hands on familiarity
with the terminal, using the package manager to install packages, and using
a text editor.

Giving LaTex a try

LaTex is a document preparation system for high-quality typesetting and is
widely used for writing scientific papers, technical manuals, and even
creating presentations (more on this later). TeX Live is an easy way to get up
and running with LaTex on your Linux system. A file written in LaTex is
usually saved as a .tex file. Once you have created your
.tex file, you will compile the file using the latex
command, which gives you a dvi file, which you can convert to a PDF
file using a tool like dvipdf. You can also directly get a PDF
output using the tool pdflatex.

You can edit LaTex in your text editor of choice. The widely used
VIM and Emacs both support LaTex editing with
syntax-highlighting, code-completion, etc.

The best way to get started with LaTex is by getting started. Start your text
editor, type in some Latex and see the results. Here are some great
resources to help you master it:

gnuplot

If you are doing experimental work, there is a fair chance that you are
going to present your results in a graphical plot. Please welcome gnuplot.
It is a command-line driven graphing utility for creating 2-dimensional as
well as 3-dimensional graphical plots. It supports output to many file
formats with eps and fig being of special interest to the
researcher. The easiest way to install gnuplot is using the package
manager of you Linux distribution.

Calling gnuplot from your C program

A cool way to use gnuplot is to use it to display continuously
changing data that your program may be generating. For this example,
let’s assume that you have a program written in the C programming
language and you want to send the data its generating to gnuplot.
The simplest way to do this is to use the popen system call. Please
refer to this 2-cent tip published here sometime back.

There is also a ANSI C interface to gnuplot available here, but I haven’t personaly
used it.

GNU Octave

GNU Octave (referred to as Octave from now on) is a software tool in-
tended for numerical computation. Octave has extensive tools for solving
common numerical linear algebra problems, finding the roots of nonlinear
equations, integrating ordinary functions, manipulating polynomials, and
integrating ordinary differential and differential-algebraic equations.
Octave is available for use on Linux, Solaris, Mac OSX, and Windows. If
your study or play involves numerical computations, Octave is for you.
Octave has been designed with MATLAB compatibility in mind, so with careful
design you could write scripts which run on both MATLAB and Octave. It is
easily extensible and customizable via user-defined functions written in
Octave’s own language, or using dynamically loaded modules written in
C++, C, Fortran, or other languages.

Octave is most definitely available in the package repository of your
Linux distribution and that is the easiest way to get it.

GNU Octave comes with extensive documentation,
which is the best place to get started.

In case you happen to like this author’s writings, he is currently
writing an article series on Octave elswhere. Please refer to these
blog post for PDFs of the articles.

Add Python to your toolkit

I shall not use this section of the article for Python language advocacy. Rather, I shall point
you to some ways you may consider using Python in your research, if you are
already using it, or think of using it in the future as just another of your
many experiments. Python is already installed on most of the Linux
distributions. Let us now see some of the ways you can use Python:

As a scientific calculator

With its math module, Python
can be a very useful command-line scientific calculator. For
example:

The decimal module is
another interesting module which provides support for decimal fixed point and
floating point arithmetic. fractions is
another interesting module, which can be used for rational number
arithmetic.

The beauty is that you do not need to know the Python programming
language for using these modules. Just fire up the Python interpreter
and import the module(s) you want to use, then call the appropriate
function.

Scientific libraries

If you are already conversant with Python, then you would definitely like
to know about some scientific libraries which you may find useful in your
research:

NumPy and SciPy: NumPy is an extension to the standard
Python programming language, adding support for large multi-dimensional
arrays and matrices. SciPy
builds upon NumPy to provide various mathematical functions for
Integration, Optimization, Interpolation, Linear Algebra, Statistics, and
others.

matplotlib: This is a plotting library for
Python, which is also used by a lot of other Python scientific tools for
plotting data.

NetworkX:NetworkX is a Python library for the
creation, manipulation, and study of complex networks. Even if you do not have
the need to build very complex networks, it is fun to play around with it to
create simple graphs too.

Beam it with Beamer!

You have done your experiments, sent in your findings, they've
been accepted, and now you are going to present them at a
conference. You want to prepare some slides to give you company as you talk
about your work. OpenOffice.org Impress, KOffice? How about taking a look at
something different? You will surely like the results, I promise. Beamer is a LaTex class
for making presentation slides. In LaTex parlance, it is just another
Document Class. You shouldn’t even make an attempt to use Beamer, if
you have no idea what LaTex is. However, if you already use LaTex, chances
are you swear by it, then Beamer is for you.

The easiest way to install is to use the package manager in your Linux
distribution. You will see that some more packages providing other extra
LaTex classes need to be installed. Prominent among these are
’latex-xcolor’ and ’pgf’.

There is, of course, a way to install the LaTex class manually. It's easy to
do it, but not easier than the previous method. The LaTex Beamer user guide
shows how to do it.

With the danger of being pulled up by the chief editor for self-publicity,
the author would like to point to his article Typesetting Presentations with
Beamer, published elsewhere. The PDF copy is available from here.

Others you may find useful

In this section, I shall talk about some of the other tools, which are
useful for scientific computing that I know of:

Sage: Sage is an open-source mathematics
software system which brings together a huge number of open-source
mathematical libraries and tools under one common umbrella. In their own
words, the Sage community is building a car, and not reinventing the wheel.
The Sage
tutorial is a good place to start exploring it.

Xfig: Xfig is a
drawing tool which may be used to create figures, import and export figures
in various other formats. fig is the native format that Xfig uses to
save its images. A useful way to use Xfig that I have learnt is to export
your plots from gnuplot as .fig images, open them
in Xfig to add the relevant text/labels to your plots, then
export the plot in a format you desire.

The small ones

One category of tools that I haven't talked about yet is some of the Linux
utilities, the ones that ship along with the Linux system- Shell
scripting, awk and sed. My knowledge of each is very
limited, but I have a working knowledge of them all and I can get things done
with them, which is what matters a lot of times.

Shell scripting: Shell scripting is very useful for
running batch jobs- fire up a job and it should finish without requiring any
intermediate user intervention. Its all the more useful when you have to run
an experiment of yours a large number of times. Writing a shell script is
often as simple as just writing five separate commands into one file in a
for loop and running that file. Ofcourse, that's oversimplification,
but sometimes it is that simple to setup a batch job. Do not get put
away by the title, but the Advanced
Bash-Scripting guide is a good place to get started with scripting for
BASH shells.

awk: The only way I have used awk so far (and a
lot of times) is for text extraction from huge files containing numerical and
textual data. Even with this limited use, it can be a useful tool when you
want to extract text that is stored in a tabular format. The GNU Awk User's
Guide is the place to get started to explore awk.

sed: sed is a stream editor
and is useful to carry out batch editing as well as editing of individual
text files. This
three-part series on sed is very comprehensive treatement of the features of
sed.

Conclusion

Its often the case that we do not know of a great tool which would
simplify our work as we go about our research work everyday. My hope as a
novice researcher with this article is to acquaint the kind of software tools
available in the Open Source world which are very useful, capable and fun to
work with and we have looked at some of these today. I have made no attempt
to be exhaustive here. There are a lot of other tools out there, which I
haven’t yet used, so I have refrained from writing about them. I hope
you have enjoyed reading this article as much as I have writing this one!

Acknowledgements

Credits are due to the folks on the mailing lists and the documentation of
all the projects that I have talked today, because they are the source of my
learning.