Monday, June 4, 2012

Scientific Computing in Perl

I come from a scientific computing background (computational
biology) and as such have often had to perform numerical computing tasks in numerous
programming languages ranging from tried and true stalwarts like Fortran and C to
languages like MATLAB.A large amount of
the scientific computing work I have done has also been done in Perl which has
a long and rich history in the field of bioinformatics.Recently, however, there seems to be a shift
in the trend of people entering computational biology and bioinformatics to
tend towards learning and using Python instead of Perl, citing the availability
of libraries such as SciPy (for numerical computing) and interfaces such as RPy
(for statistical computing with R).This
post is not meant to start a flame war with any fans of Python or Python’s
scientific computing capabilities, as the Python community has done some great
work in the area of scientific computing and has even produced some tools I
have made extensive use of, such as Pymol (http://www.pymol.org/).In fact I actually think competition is
always a good thing.Rather, I intend to
use this post to raise some awareness of some of the Perl modules that can be
of great benefit to anyone considering using Perl for scientific computing
purposes.

Perl Data Language (PDL) – allows Perl to manipulate large
n-dimensional arrays (similar to MATLAB, NumPy, etc) in a quick and efficient
manner.The PDL namespace on CPAN
contains many modules that should be of interest to scientific programmers
including interfaces for numerical computing functions found in the GNU
Scientific Library (http://www.gnu.org/software/gsl/).More information on PDL can be found at http://pdl.perl.org/.

The Statistics namespace on CPAN – This namespace includes
many Perl modules that allow the computation of numerous statistical analyses,
ranging from basic descriptive statistics (Statistics::Descriptive) to more
sophisticated analyses like multivariate regression (e.g.Statistics::Regression).There is even a module that allows your Perl
code to interface with the statistical computing language R
(Statistics::R).

The Math namespace on CPAN – This namespace includes all kinds
of advanced math functions that can be easily integrated into a Perl
application.While there are too many to
mention in such a short synopsis, modules located here will provide support for
dealing with areas of math such as trigonometry (Math::Trig) and complex
numbers (Math::Complex) as well as provide access to algorithms for solving
many types of math problems, such as solving for the roots of polynomial
equations (Math::Polynomial::Solve).

BioPerl – while not as generic to scientific computing as
some of the ones mentioned above, BioPerl is a large and widely used collection
of Perl modules for performing bioinformatics tasks.More information on BioPerl can be found at http://www.bioperl.org/wiki/Main_Page.

Perl’s Inline capability – the ability to integrate code
from other languages into your Perl application (particularly inline C) can
help to give your applications the flexibility and ease of use of Perl while
allowing you to optimize certain parts of your application to improve
performance.

Of course there are many other modules as well that could be
of great use for many scientific applications, as there are also modules that
deal with data mining techniques, machine learning, and numerous other aspects
of data analysis.The intent, however,
was to demonstrate that Perl does have a rich set of tools available to it for
use in the development of scientific computing applications and that Perl
should not be quickly dismissed in favor of Python.I think for anyone entering the fields of
computational biology or bioinformatics, Perl is still a language worth
learning and still my preferred language for bioinformatics tasks.Even if you decide in favor of Python for your
new projects, you should be aware that many existing projects are written in
Perl, and you may well have to maintain, modify, or interface with such
codebases, where knowledge of Perl will only be to your advantage.

My response to this is contained in my latest blog post - http://perlgems.blogspot.com/2012/06/improving-image-of-perl-perl-marketing.html

I think it is not really a problem of what Python is doing better, but rather one of Perl being viewed as an old and uncool language that is difficult to work with. Perl does not seem to be trendy right now and that influences the choices a lot of up and coming programmers make.

Certainly you have mentioned the major Perl-for-Science modules, however I would like to mention my PerlGSL namespace, which provides Perlish (closure-based) interfaces to the GSL. They are young, but they are powerful as well!