Search

PracTcl Programming Tips

The other day I wrote a Tcl program, and
it was really slow. Even after I got all of the bugs out, it was
still slow, so it was time to optimize its performance. My first
rule of program optimization is to do nothing; just wait for a
faster machine. Unfortunately, in this case, I can't wait that
long.

My second rule of optimization is to avoid spending any
effort optimizing code that doesn't matter, either because it never
gets run (don't laugh, it happens more than you think), or it
doesn't take a significant fraction of the total run time of the
program.

As an aid to program optimization, Tcl provides a
time command that measures and reports the time
it takes to execute a Tcl script. It's called as:

time scriptcount

where script is
the Tcl script to time, and
count is the number
of times to run the script, it's run once if no count is given.
time reports its result in the form:

3000 microseconds per iteration

The time is the average elapsed time for each time the script
is timed.

Occasionally, it is useful to use the time
command as-is to time procedures as they are run on various inputs.
Once you have pin-pointed the “time hog” procedure in your
application, time can be used this way to
measure the performance effects of recoding techniques or new
algorithms (or faster machines).

However, for any serious program optimization, a more
systematic approach is needed. We can write some procedures that
use the time command to enable us to compute an
execution time profile of an entire application.

One way to profile an application would be to re-write every
procedure, surrounding its body with a time
command, whose result can be tabulated in some way. Since Tcl lets
us ask about the arguments and body of every procedure, the
procedures can be rewritten automatically by the profiler.

The disadvantage of this approach is that it only works for
procedures and not for the built-in commands. Considering that one
of the primary ways of speeding up a Tcl application is recoding
some of the procedures in C and turning them into built-in
commands, it would be nice to be able to profile commands as well
as procedures.

Instead of re-writing procedures, we'll rename them and write
a new procedure with the original name that calls (and
times) the renamed version. This way,
profiling can be done on both commands and procedures.

Gathering the Data

The first step is to write the replacement procedure. It will
be the same for all procedures and commands except for its name,
and the name of the renamed original it will call. The easiest way
to accomplish this is by writing a procedure body template and
using the format command to substitute in the
appropriate names.

The procedure profile_proc takes the name
of a Tcl procedure, renames it by adding the suffix
:OLD, then uses the template (in
Template) to write a new procedure that times
the calls to the original. Although different commands and
procedures will require differing numbers of arguments, by using
args, we can arrange for them to be passed to
the original (:OLD) procedure un-changed.

The global variable Template contains the
format template for the new procedure body,
which substitutes the procedure name in two places, one to call the
original and another to log the results. The 1$
in the format specifier indicates the first parameter to format
should be used both times.

The timing information is written to a file so that it may be
analyzed off-line. We will make sure the variable
Profile_fd contains a file handle to which we
can write timing information. First, we run the “original”
command with time. By using
uplevel, we can insure the original command will
run in the same stack level it expects, so if it uses any
upvar commands they will work properly. Since
nothing in Tcl prevents a procedure name from having special
characters in it, the %s:OLD needs to be passed
though the list command. The original arguments
to the procedure, that were gathered up into the single list
args, are expanded into their original form by
uplevel. The lindex 0 extracts just the
time portion of the time command output and
saves it in the variable time.

To properly account for the time spent in each procedure, not
only is the timed procedure's name required, but its caller's name
is needed as well, as will be apparent when it comes time to
account for all of the time.

The info level command is used to
determine the caller's name (or TopLevel, if
called from the top level). The regsub gets rid
of the :OLD to make the book-keeping
easier.

Finally, the procedure name, time spent, and caller name are
all written to the logging file, and the result returned by the
renamed procedure, result, is returned to the
caller.

First, it opens the file that will receive the timing
information, and calls profile_proc for the
procedures we wish to profile. If commands are to be profiled as
well as procedures, the info procs could be
changed to info commands. At this point, the
application can be run as usual, except that for each profiled
command or procedure, a line of timing information is written into
the file.

The procedure profile_stop closes the log
file, removes the special profiling procedures, restores the names
of the original procedures, and calls
profile_summarize, which prints a summary of the
profile data. Fancier versions of profile_start
and profile_stop could check to make sure a
procedure isn't profiled twice, or that time isn't wasted profiling
the profiling routines.

Analyzing the results

Analyzing the profile data is a bit tricky. The time
attributed to a particular procedure consists of not only time
spent in that procedure, but the time spent in all of its children
(“called” procedures) plus the time taken by the profiling code
that times the child procedures.

We can approximate the time spent in the profiling code by
running a procedure with—then without—the profiling code and
computing the difference. The procedure
profile_calibrate does just that.

The profile_summarize procedure uses a
souped-up version of the Tcl incr command
(called Incr that permits
incring a variable before it is set by
automatically initializing it to zero before incrementing
it.

After a bit of initialization,
Profile_summarize works in two stages, reading
and tabulating the timing information, then sorting and printing
the results.

Each line of timing information is read into three variables:
the procedure name, the elapsed
time in µs, and the
parent, or calling, procedure name. Two arrays,
count and sum, both indexed
by procedure name, are used to keep track of the number of times
each procedure is called and the accumulated time for each
procedure. Next, the time attributed to each procedure is
subtracted from the time credited to its parent, along with the
timing overhead. Only procedures that are called from the top level
have their times included in the total time
counter. Otherwise, the time would be counted twice, once for the
procedure and again for its caller.

Once all of the data is tabulated, it is sorted by procedure
name. The results are printed in the form of procedure name, number
of times the procedure was called, the total elapsed time spent in
the procedure, and the percentage of the total program execution
time spent in the procedure. The time values are divided by 1000 to
convert them from µs into ms. Note that after about 35
minutes, the number of µs overflows a 32 bit integer, so
profiling runs shouldn't be much longer than a half hour.

Listing 1 at the bottom of this article shows a subset of the
results of a profiling run for an HTML library package. For this
test case, it's clear that HMrender accounts for
a proportionally large share of the total running time of the
program. Even if it could be recoded to run infinitely fast, the
overall application would be sped up by, at most, 27%. The negative
time attributed to HMtag_a is probably due to
the variability of the profiling overhead calculation.

Even though there can be considerable variability in the
elapsed times of procedures, depending upon the current load on the
processor, these simple profiling routines can quickly point out
those parts of an application that are consuming most of the
running time and would be good candidates for optimization.

Stephen Uhler is a researcher at Sun Microsystems
Laboratories, where he works with John Ousterhout improving Tcl and
Tk. Stephen is the author of the MGR window system and of
PhoneStation, a TCL-based personal telephony environment. He may be
reached via e-mail at
Stephen.Uhler@Eng.Sun.COM.