Categories

With today’s blazingly fast hardware, our capacity to “make things slow” continues to amaze me. For example, on my system, there is a noticeable delay between the moment a terminal window is opened, and the moment the command prompt actually shows up.

This post explores how we can quickly quantify the problem and and pinpoint the main causes of the delay.

Quantifying the problem

Let’s first see where the problem might be. A likely candidate is of course my ~/.zshrc, so I added 2 log statements: one at the top, one at the bottom:

1

date"+%s.%N"

This indeed showed my ~/.zshrc took about 300ms, enough to cause a noticeable delay.

Quick insight: zprof

The quickest way to get an idea of the culprit is the zprof module that comes with zsh. You simply add zmodload zsh/zprof to the top of your ~/.zshrc, and the zprof built-in command will show a gprof-like summary of the profiling data.

A notable difference between gprof and zprof is that where gprof measures CPU time, where zprof measures wall clock time.

This is fortunate: CPU time is the time a program was actually consuming CPU cycles, and excludes any time the program was for example waiting for I/O. It would be fairly useless to profile zsh in this way, because it probably spends most of its time waiting for invoked commands to return.

zprof provides a fairly rich output, including information about the call hierarchy between functions. Unfortunately, it measures performance per function, so if those are long you’re still left wondering which line took so long.

Digging deeper: xtrace

An approach to profiling zsh scripts that will give per-line metrics is using xtrace. Using xtrace, each command that zsh executes is also printed to file descriptor 3 using a special prompt which can be customized with the PS4 environment variable to include things like the current timestamp.

We can collect these detailed statistics by adding to the top of our ~/.zshrc:

1

2

3

PS4=$'\\\011%D{%s%6.}\011%x\011%I\011%N\011%e\011'

exec3&gt;&amp;22&gt;/tmp/zshstart.$$.log

setopt xtrace prompt_subst

And to the bottom:

1

2

unsetopt xtrace

exec2&gt;&amp;33&gt;&amp;-

There are 2 problems with this detailed trace:

This approach will provide confusing output when there is any parallelism going on: trace messages from different threads of execution will simply get interleaved

It is an overwhelming amount of data that is hard to digest

When you’re dealing with parallelism, perhaps you can first use zprof and then only xtrace the function you know is a bottleneck.

When you’re overwhelmed by the amount of data, read on…

Visualizing: kcachegrind

If we assume there’s no parallelism going on, we can visualize our zsh script profile using kcachegrind. This tool is intended to visualize the call graphs produced by valgrind‘s callgrind tool, but since the file format used is fairly simple we can write a small tool to convert our xtrace output.

Summary

Applying this to shell startup time is of course a rather silly exercise – though I’m quite happy with the result: I went from ~300ms to ~70ms (now mostly spent on autocomplete features, which are worth it).

The main lesson: combining tools that were not originally intended to be used together can produce powerful results.