Oracle Blog

rantings of a code minimalist

Sunday Dec 27, 2009

As software becomes increasingly complex and codebases continue to sprawl,
source code cross-reference tools have become a critical component of a
software engineer's toolbox. Indeed, since most of us are tasked with
enhancing an existing codebase (rather than writing from scratch),
proficiency in use of a cross-reference tool can mean the difference
between understanding the subtleties of a subsystem in an afternoon and
spending weeks battling "unforeseen" complications.

At Sun, we primarily use a tweaked version of the venerable
cscope utility which has origins going back to AT&T in the 1980s
(now freely available from cscope.sourceforge.net). As with
many UNIX utilities, despite its age it has remained popular because of
its efficiency and flexibility, which are especially important when
understanding (and optionally modifying) source trees with several million
lines of code.

Despite cscope's importance and popularity, I've been surprised to
discover that few are familiar with anything beyond the basics. As such,
in the interest of increasing cscope proficiency, here's my list of five
features every cscope user should know:

Display more than 9 search results per page with -r.

Back in the 1980s the default behavior may have made sense, but with
modern xterms often configured to have 50-70 rows the default is
simply inefficient and tedious. By passing the -r option
to cscope at startup (or including -r in the
CSCOPEOPTIONS environment variable), cscope will
display as many search results as will fit. The only caveat is that
selecting an entry from the results must include explicitly pressing
return (e.g., "3 [return]" instead of "3") so that
entries greater than 9 can be selected. I find this tradeoff more
than acceptable. (Apparently, the current open-source version of
cscope uses letters to represent search results beyond 9
and thus does not require -r.)

Display more pathname components in search results with -pN.

By default, cscope only displays the basename of a given
matching file. In large codebases, files in different parts of the
source tree can often have the same name (consider main.c),
which makes for confusing search results. By passing the
-pN option to cscope at startup (or
including -pN in the CSCOPEOPTIONS
environment variable) -- where N is the number of pathname
components to display -- this confusion can be eliminated. I've
generally found -p4 to be a good middle-ground. Note that
-p0 will cause pathnames to be omitted entirely from
search results, which can also be useful for certain specialized
queries.

Use regular expressions when searching.

While it is clear that one can enter a regexp when using "Find
this egrep pattern", it's less apparent that almost all search
fields will accept regexps. For instance, to find all definitions
starting with ipmp_ and ending with ill, just
specify ipmp_.\*ill to "Find this definition". In
addition to allowing groups of related functions to be quickly found,
I find this feature is quite useful when I cannot remember the exact
name of a given symbol but can recall specific parts of its name.
Note that this feature is not limited to symbols -- e.g., passing
.\*ipmp.\* to "Find files #including this file" returns
all files in the cscope database that #include a file with
ipmp somewhere in its name.

Use filtering to refine previous searches.

cscope provides several mechanisms for refining searches.
The most powerful is the ability to filter previous searches through
an arbitrary shell command via \^. For instance, suppose you
want to find all calls to GLDv3
functions (which all start with mac_) from the nge driver
(which has a set of source files starting with nge). You might
first specify a search pattern of mac_.\* to "Find
functions calling this function". With ON's cscope
database, this returns a daunting 2400 matches; filtering with
"\^grep common/io/nge", quickly pares the results down to
the 12 calls that exist within the nge
driver. Note that this can be repeated any number of times -- e.g.,
"\^sort -k2" alphabetizes the remaining search results by calling
function.

Use the built-in history mechanisms.

You can quickly restore previous search queries by using
\^b (control-b); \^f will move forward through the
history. This feature is especially useful when performing depth-first
exploration of a given function hierarchy. You can also use \^a to
replay the most recent search pattern (e.g., in a different search field),
and the > and < commands to save and restore
the results of a given search. Thus, you
could save search results prior to refining it using \^ (as per the
previous tip) and restore them later, or restore results from a past
cscope session.

Of course, this is just my top-five list -- there are many other powerful
features, such as the ability to make changes en masse, build custom
cscope databases using the xref utility,
embed command-line mode in scripts (mentioned in a previous blog entry),
and employ numerous extensions that provide seamless interaction with
popular editors such as XEmacs and vim. Along these lines, I'm eager to
hear from others who have found ways to improve their productivity with
this exceptional utility.