Improving Perl Application Performance

The four basic performance-tuning steps to improve an existing application's performance.

A fellow developer and I have been working
on a data collection application primarily written in Perl.
The application retrieves measurement files from a directory, parses the
files, performs some statistical calculations and writes the results to
a database. We needed to improve the application's performance so
that it would handle a considerable load while being used in production.

This paper introduces four performance-tuning steps: identification,
benchmarking, refactoring and verification. These steps are applied to
an existing application to improve its performance. A function
is identified as being a possible performance problem, and a baseline
benchmark of that function is established. Several optimizations are
applied iteratively to the function, and the performance improvements
are compared against the baseline.

Identifying Performance Problems

The first task at hand in improving the performance of an application
is to determine what parts of the application are not performing as
well as they should. In this case I used two techniques to identify
potential performance problems, code review and profiling.

A performance code review is the process of reading through the code
looking for suspicious operations. The advantage of code review is
the reviewer can observe the flow of data through the application.
Understanding the flow of data through the application helps identify any control loops that can be eliminated. It also helps
identify sections of code that should be further scrutinized
with application profiling. I do not advise combining a performance
code review with other types of code review, such as a code review for
standards compliance.

Application profiling is the process of monitoring the execution of an
application to determine where the most time is spent and how frequently
operations are performed. In this case, I used a Perl package called
Benchmark::Timer. This package provides functions that I use to mark
the beginning and end of interesting sections of code. Each of these marked
sections of code are identified by a label. When the program is run and
a marked section is entered, the time taken within that marked section
is recorded.

Adding profiling sections to an application is an intrusive
technique; it changes the behavior of the code.
In other words, it is possible for the profiling code to overshadow
or obscure a performance problem. In the early stages of performance
tuning, this may not be a problem because the magnitude of the performance
problem will be significantly larger than the performance impact of the
profiling code. However, as performance issues are eliminated, it is more
likely that a subsequent performance issue will be harder to distinguish.
Like many things, performance improvement is an iterative process.

In our case, profiling some sections of the code indicated that a
considerable amount of time was being spent calculating statistics of
data collected off the machine. I reviewed the code related to these
statistics calculations and noticed that a function to calculate standard
deviation, std_dev, was used frequently. The std_dev calculation caught
my eye for two reasons. First, because calculating the standard deviation
requires calculating the mean and the mean of the sum of squares for
the entire measurement set, the naï¿½e calculation for std_dev uses two
loops when it could be done with one loop. Secondly, I noticed that the
entire data array was being passed into the std_dev function on the
stack rather than being passed as a reference. I thought these
two items together might indicate a performance issue worth examining.

Benchmarking

After identifying a function that could be improved, I proceeded to
the next step, benchmarking the function. Benchmarking is the process
of establishing a baseline measurement for comparison.
Creating a benchmark is the only way to know whether a modification
actually has improved the performance of something. All the benchmarks
presented here are time-based. Fortunately, a Perl package
called Benchmark was developed specifically for generating time-based benchmarks.

I copied the std_dev function (Listing 1) out of the application
and into a test script. By moving the function to a test script, I
could benchmark it without affecting the data collection application.
In order to get a representative benchmark, I needed to duplicate the
load that existed in the data collection application. After examining
the data processed by the data collection application, I determined that
a shuffled set of all the numbers between 0 and 999,999 would be adequate.

The calculation in the Statistics::Descriptive package assumes that the data available is a sample from the population, does not contain the full population. See: http://en.wikipedia.org/wiki/Standard_deviation#Estimating_population_SD
In the Statistics::Descriptive documentation, this is referenced by the note: "Returns the standard deviation of the data. Division by n-1 is used."

The calculation used in the article assumes that the data represents the full population.

I would love to see you demonstrate even just one example where this is the case. The gain of _only_ 11.75x of your "C" over Perl in the case you describe is because you used XS for the implementation and not pure C with XS to just glue the two together. For big arrays you'll find it's faster to transcribe the Perl array into a C array of floats, and to do the work in pure C. Perl is usually about two orders of magnitude (100x) slower than C or decently coded C++.

What you say about object oriented interfaces slowing things down is also completely untrue. The only thing you'll save by using procedural rather than OO implementation is a pointer dereference when you call the std_dev method on the object - which is negligible compared to the calculations inside the function.

Hopefully, in the future, there will be less of a need for this sort of thing... With any luck, Perl6 and Parrot will prove to be faster, and far easier to integrate with C. In fact, the equivalent Parrot routines are already only about 3x slower than the equivalent C program, and both are far faster than Perl5 is today. (code follows)
-- pb

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.